coq / coq

Coq is a formal proof management system. It provides a formal language to write mathematical definitions, executable algorithms and theorems together with an environment for semi-interactive development of machine-checked proofs.
https://coq.inria.fr/
GNU Lesser General Public License v2.1
4.66k stars 632 forks source link

Additions to String module and extraction to OCaml native strings #19037

Open BridgeTheMasterBuilder opened 2 weeks ago

BridgeTheMasterBuilder commented 2 weeks ago

Hi, this is my first PR to the repository, I apologize in advance for any potential missteps!

Summary of changes:

I used these in a toy project to be able to extract to idiomatic OCaml string code without having to explode strings into a list of characters.

The definitions are essentially identical to those found in the List module except specialized to ascii. I also attempted to mirror the documentation style in the List module to some extent in anticipation of future expansion but I welcome feedback. The headings should probably be made bigger, also I'd be happy to flesh the module out a bit more by adding fold, existsb, etc. so the headings actually make sense.

Since these functions are primarily intended for computation/extraction, I'm not sure if associated lemmas are high priority, as you can probably convert the string to list ascii to do proofs (that's what I did in the linked project). I'm curious what the standard library's policy on lemma reuse is in general. Since string has the same shape as list ascii, it seems like a lot of the lemmas in List that you would want to include in String would have essentially similar proofs, so there would be a lot of duplication.

I have yet to complete the necessary tasks like updating the changelog, polishing the documentation, tests etc. As per the recommendation in the contributor's guide I'm opening this PR now to get feedback. Please let me know if I need to change anything.

andres-erbsen commented 2 weeks ago

@coqbot run full ci

BridgeTheMasterBuilder commented 2 weeks ago

If you have lemmas that relate these new functions to versions that operate on list, these lemmas would be welcome as well.

I do, but I was unsure whether to include them. Just to make sure, do you mean something like this?

Lemma string_list_ascii_map_eq : forall (f : ascii -> ascii) (s : string),
    map f s = string_of_list_ascii (List.map f (list_ascii_of_string s)).
Proof.
  induction s; [ reflexivity | simpl; now f_equal ].
Qed.

Lemma string_list_ascii_forallb_eq : forall (f : ascii -> bool) (s : string),
    forallb f s = List.forallb f (list_ascii_of_string s).
Proof.
  induction s; [ reflexivity | simpl; now f_equal ].
Qed.

Please add a changelog entry as explained in https://github.com/coq/coq/tree/master/doc/changelog#how-to-add-an-entry

Will do.

BridgeTheMasterBuilder commented 2 weeks ago

I added a changelog entry.

BridgeTheMasterBuilder commented 2 weeks ago

By the way, it looks like CI is failing because some of the tests import the String module but then expect map to refer to the function in the List module :/ Does this qualify as a breaking change? I can imagine a lot of code would be affected. I guess this is what "overlay PR"s are for.

andres-erbsen commented 2 weeks ago

Yes, this change making import order significant could be a compat nuisance as you describe. I'll kick off the CI again, maybe it'll work this time.

andres-erbsen commented 2 weeks ago

@coqbot run full ci

Villetaneuse commented 1 week ago

Thank you very much for your contribution, @BridgeTheMasterBuilder. It is indeed a breaking change, but it is not hard to fix: either Import List after String or sed 's/\<map\>/List.map/g'. Another possibility would be to split your String.v into String.v and StringMap.v, but I like it less (multiplying stdlib files creates entropy). @proux01 what is the policy about such a change? Does it break the "2 consecutive versions contract"?

JasonGross commented 1 week ago

Does it break the "2 consecutive versions contract"?

AIUI, this contact can only be broken when there's no backwards-compatible fix. Since there's an easy backwards compatible fix here, we just need to update CI developments. It might be worth writing a script (more robust than the sed script) to do this: it should be possible to write a generic qualify-names Coq.Lists.List.map map List.map . (qualify-names fully-qualified-constant search-text replacement-text list-of-files-or-directories) script that parses .glob files to find instances of the search text which refer to the fully qualified constant, and replaces them. (Presumably coq-lsp could do something similar.)

In fact, https://github.com/JasonGross/coq-tools/blob/master/coq_tools/absolutize_imports.py could do the job with just minor changes, namely the code at https://github.com/JasonGross/coq-tools/blob/b29cc4488f36e420ff892a7db7333d26219d0db9/coq_tools/import_util.py#L509-L546 needs to get extra arguments for "restrict the search text (cur_code) to this set of constants" and "restrict the fully qualified names (rep_orig) to this set of constants", and then https://github.com/JasonGross/coq-tools/blob/b29cc4488f36e420ff892a7db7333d26219d0db9/coq_tools/absolutize_imports.py#L33 / https://github.com/JasonGross/coq-tools/blob/b29cc4488f36e420ff892a7db7333d26219d0db9/coq_tools/absolutize_imports.py#L47-L56 would have to pass these two new arguments as well as passing something like mod_remap={'Coq.Lists.List':'List'} (to be constructed from command line arguments). (And then we'd need to pass -a to absolutize all constants and not just imports.

BridgeTheMasterBuilder commented 1 week ago

I was expecting this to be an innocent little addition. :sweat_smile: Sorry about that. If there's still interest in making these changes, perhaps it would be a good idea to flesh out this PR a little bit more so it's worthwhile. Some things that come to mind:

I think additions like this would make String and Ascii both well-rounded. Thoughts?

By the way, I didn't get a response to this question:

If you have lemmas that relate these new functions to versions that operate on list, these lemmas would be welcome as well.

I do, but I was unsure whether to include them. Just to make sure, do you mean something like this?

Lemma string_list_ascii_map_eq : forall (f : ascii -> ascii) (s : string),
    map f s = string_of_list_ascii (List.map f (list_ascii_of_string s)).
Proof.
  induction s; [ reflexivity | simpl; now f_equal ].
Qed.

Lemma string_list_ascii_forallb_eq : forall (f : ascii -> bool) (s : string),
    forallb f s = List.forallb f (list_ascii_of_string s).
Proof.
  induction s; [ reflexivity | simpl; now f_equal ].
Qed.

Are these the right types of lemmas?