Radiance-Technologies / prism

Utilities for creating and curating a dataset for automated proof repair in the Coq Proof Assistant.
GNU Lesser General Public License v3.0
5 stars 2 forks source link

Partial application of diffs #18

Closed a-gardner1 closed 7 months ago

a-gardner1 commented 2 years ago

Partial application of diffs provides a means to produce supervised learning examples of repairs. A more precise specification is worth debate, but the following description provides a baseline.

Given two commits, the latter of which builds successfully, do

  1. Compute the diff.
  2. Determine if the change is relevant (i.e., did it only alter a comment, were any proofs altered, etc.). If not, abort this sequence.
  3. Remove a randomly chosen part of the diff (especially changes to a proof).
  4. Supply the first commit and the altered diff as an example of a broken state in need of repair and the second commit as a ground truth target repair.

Note that the above sequence can be applied multiple times for the same commit to yield multiple examples. For the first iteration of the repair dataset, I recommend constraining diff omissions to proof components only.

tom-p-reichel commented 2 years ago

For part (1) the global alignment algorithm is useful.

Implementation.

Alternatively, if you don't anticipate wanting to customize any parameters of the diffing algorithm, there is also the builtin difflib.

a-gardner1 commented 2 years ago

Thanks! FYI, we are working on extracting the code for heuristic parsing and various other utilities for project management from our main repo by the end of the week. Currently waiting to finish and merge one big new feature before we decouple it from the rest of our internal libraries.

tom-p-reichel commented 2 years ago

RE: Today's discussion.

I found that sertop immediately executes import statements as described-- but if the imports are excluded from the source, identifiers referring to things that do not exist (tactics, definitions) do not produce an error, and are parsed just fine.

However, the parser is a stickler for notations. For example, here is part of an interactive session with sertop:

(Add () "Definition B := &A.")
(Answer 0 Ack)
(Answer 0 (CoqExn ...))
(Answer 0 Completed)

sertop throws an exception as it has not seen the notation definition for &. Importantly, sertop does not infer notation definitions.

sertop appears to immediately evaluate notation definitions, like how imports are immediately evaluated.

This means that if an import that contains a notational definition is excluded from code that uses that definition, the parse will fail.

I suspect (hope) many projects use few-to-no notational definitions. If this is the case, we can provide a small set of dummy definitions for libraries that allows them to be parsed by sertop without compilation.

For example, suppose a library uses &, but we don't want to compile the library. We can provide a prelude to the coq file that contains

Notation "& A" := A (at level 80).

and allow our previous example to be parsed without error.

It may also be possible to scrape all notation definitions from a library and replace their definitions with a dummy value to generate a prelude that allows that library to parse.

a-gardner1 commented 2 years ago

This is challenging some of my assumptions for sertop. I'll need to do some experimentation myself to see whether it can be used to get suitable ground-truth information or not about Coq proof state.

Does it accurately predict proof mode? I'm not sure how it could do so without executing the Coq code, and I'm not sure how it could execute the code with unknown identifiers and notation.

While the ability to skip imports and notations does help accelerate some things, it unfortunately won't stop us from having to build projects anyway to establish proof success or failures.

tom-p-reichel commented 2 years ago

It does seem to predict proof mode, even without a Proof. introduction[^1]. Here's some COQ source code fed through, then printed out with an "is_tactic" annotation:

Inductive nat := Zero | Succ : nat -> nat.  (* is tactic? False *)
Fixpoint add (a b : nat) : nat :=
  match a with
  | Zero => b
  | Succ x => Succ (add x b)
  end.  (* is tactic? False *)
Theorem easy : forall (a b : nat), (add a b = a) -> (b = Zero).  (* is tactic? False *)
intros.  (* is tactic? True *)
induction a.  (* is tactic? True *)
assumption.  (* is tactic? True *)
simpl in H.  (* is tactic? True *)
inversion H.  (* is tactic? True *)
apply (IHa H1).  (* is tactic? True *)
faketactic.  (* is tactic? True *)
Qed.  (* is tactic? False *)
Definition foo := bar.  (* is tactic? False *)

This program was parsed even though the definition of foo refers to bar, which doesn't exist. Also faketactic was still identified as a tactic, despite not being defined.

I deduce whether a line is a tactic by doing the following check on the s-expression AST emitted by sertop, which itself is parsed using a slightly modified[^2] version of this library:

match self.ast:
    case [S("CoqAst"), [[S("v"),[_, _, [S("expr"),[S("VernacExtend"), [S("VernacSolve"),0], _]]]],_]]:
        return True
return False

Essentially, I check that the top of the AST refers to the contents of the AST as a "VernacSolve", which is what I believe is COQ's internal name for tactic syntax.

Regarding how sertop does this: I don't really know. I do have speculation, though. Proof mode is distinct from other parts of the language in that lines in proof mode do not start with keywords like "Inductive" or "Definition", so it might just use this, in concert with the fact that a definition wasn't given for a theorem in a previous line, to know when to go to proof mode.

[^1]: While it doesn't care much about the Proof keyword, I seem to have gotten my workflow to fail silently(!) by excluding the ending Qed. and continuing to another definition without closing the proof. Doing this caused my workflow to emit an empty AST for the remaining lines in the file. I don't think users usually exclude a Qed or Defined though, because if you try to start a proof without closing the finished one, COQ usually complains about nested proofs.

[^2]: I modified my copy of the s-expression library so that match statements operate on AST trees. I can give my changes if this seems useful. By default, match statements will not correctly equate identical symbols in s-expressions, so the code I gave won't work and you'd have to transcribe it to equality checks(s) if you wanted to use it verbatim.

a-gardner1 commented 2 years ago

I suspect sertop does this as you speculate: anything that is not a known vernacular command is treated as a tactic. That is similar to how our heuristic parser makes the determination.

My main concern, and I don't know how prevalent this situation is, is if one has two definitions in a row followed by a proof (or any two vernacular commands that can both optionally be completed by a proof and optionally appear within a proof; see the complete list of known theorem starters). Since definitions (or other command satisfying the prior constraint) can appear in the middle of proofs, one cannot know which definition the proof is completing without understanding the semantics of the code or querying the Coq program state (or proof goals) on the relevant lines. It seems to me that the only foolproof method is to interactively query Coq.

I am assuming elimination of this ambiguity is important for accurately aligning terms to ultimately discover whether one commit repaired or altered a proof in another.

a-gardner1 commented 2 years ago

Also, with regard to your first footnote: I can confirm the existence of files in the dataset with unterminated proofs that do not trigger any nested proof errors. Our heuristic parser appends an Admitted. statement whenever they are detected.

tom-p-reichel commented 2 years ago

I can confirm the existence of files in the dataset with unterminated proofs that do not trigger any nested proof errors.

OK, I installed the 2016 version of Coq on a hunch and found that it does not issue a warning or error for nested proofs, unlike the current version of Coq.

However, what a tactic applies to in this version doesn't seem to rely on program state, it looks like it is using a stack.

An example:

Theorem a : False.
(* currently proving a *)
Theorem b: True. (* stack push *)
(* currently proving b *)
constructor.
(* still proving b -- "No more subgoals." *)
(* adding a tactic here does not start proving a. *)
Qed. (* stack pop *)
(* now proving a *)
Admitted.

To mess with this version yourself:

opam switch create oldcoq 4.01.0;
opam switch install coq=8.6 coqide=8.6;
eval $(opam env --switch=oldcoq);
coqide;