Ltac2 local2ptree - Githubissues

MSoegtropIMC commented 10 months ago

This PR replaces local2ptree with a version written in Ltac2 which is much faster (in measurements about 100x for a case with about 60 local variables).

The difference in make test is anyway quite small (1..2%) - the time of local2ptree is more visible in developments with many local variables.

Here are the make test times for the old and new version.

make -j 8 test  209.18s user 8.87s system 393% cpu 55.419 total
make -j 8 test  206.72s user 8.51s system 397% cpu 54.148 total

MSoegtropIMC commented 10 months ago

One note: I am using lazy instead of cbv here because it can be faster at Qed times - at tactic time it should not make much of a difference.

andrew-appel commented 10 months ago

I suggest: Add a brief comment just before the Std.lazy line (if that's the right place?), saying ( use lazy instead of cbv because Qed can be faster )

andrew-appel commented 10 months ago

I suggest, at line 71 augment your comment with, "Normally this makes a significant but small improvement in performance; but when there are many local variables (e.g. 60) it makes a huge difference."

JasonGross commented 9 months ago

One note: I am using lazy instead of cbv here because it can be faster at Qed times - at tactic time it should not make much of a difference.

I don't understand this comment. The difference between lazy and cbv is that lazy uses the same algorithm for reduction that Qed / the kernel uses, while cbv uses a different algorithm. But both tactics leave exactly the same trace in the proof term (a cast node with, I think, the post-reduction term (though it may be the pre-reduction term, I don't recall)), and regardless of which one you do, Qed / the kernel will always use the lazy algorithm. So any difference you see between lazy and cbv at Qed-time should just be noise.

MSoegtropIMC commented 9 months ago

Regarding lazy vs. cbv: my (not very elaborate) understanding of this is that using lazy vs cbv doesn't make a difference if both deliver the same results, but it can make a substantial difference if the results are different, because then the result of lazy should be closer to the result of the kernel and require less unification effort. For partial evaluation with a restricted symbol list as done here, I believe it can well be that lazy and cbv give different results. I have difficulties providing examples for this, though, since lazy is less lazy that I would expect.

Anyway at some point I started to prefer lazy, but I can't exclude that this was caused by doing computations directly in hypothesis, which (afaik) doesn't leave an annotation.

JasonGross commented 9 months ago

For partial evaluation with a restricted symbol list as done here, I believe it can well be that lazy and cbv give different results

If such an example exists, it's a bug in Coq. The difference between lazy and cbv is evaluation ordering and sharing of reductions, but so long as Coq reduction is confluent (which it is widely believed to be, I think), they cannot give different results.

MSoegtropIMC commented 9 months ago

so long as Coq reduction is confluent (which it is widely believed to be, I think), they cannot give different results.

Isn't it so that the argument of confluence only applies to full evaluation? We are doing partial evaluation here (with restricted delta lists). I mean hnf and cbv also give different results without impeding confluence. I didn't look into lazy into too much detail, and as I said in tests it is less lazy then I would expect, but my understanding of lazy vs cbv is that lazy does not evaluate arguments of functions unless these arguments are actually used in the expansion of a function. If the evaluation of a function a is partially blocked say because it contains a function b which is not in the delta list, and if an argument of a is passed to b, then cbv should expand the argument, while lazy should not. So lazy should result in a term where the argument of a passed to b is not expanded, while cbv should deliver the same term with the argument to b expanded. How such differences shall impede confluence I don't see. My assumption is that such differences lead to issues in unification later, since one of the most difficult questions in unification is to decide which symbols to unfold.

I tried to make some trivial examples, but failed spending just a few minutes. If you believe my understanding of this is substantially flawed, I will spend some more time trying to understand this in more detail.

JasonGross commented 9 months ago

Isn't it so that the argument of confluence only applies to full evaluation? We are doing partial evaluation here (with restricted delta lists).

Imagine wrapping the entire environment in a lambda abstracting over the constants to be left behind. If full reduction is confluent even under lambdas it should be confluent even with restricted delta lists. Less sure about leaving out the other reductions, though I think most of them can be modeled by replacing certain constructs with opaque applications.

I mean hnf and cbv also give different results without impeding confluence

hnf is not strong, it does not recurse under binders when the head is normal. cbv and lazy do. Moreover hnf uses the simpl engine under the hood, performing refolding, so it's really not analogous. My claim applies specifically to lazy and cbv. Even VM and native can give different results, though only in two places: alpha renaming of lambda variables and in not normalizing the parameters of inductive constructors.

If the evaluation of a function a is partially blocked say because it contains a function b which is not in the delta list, and if an argument of a is passed to b, then cbv should expand the argument, while lazy should not. So lazy should result in a term where the argument of a passed to b is not expanded, while cbv should deliver the same term with the argument to b expanded.

You might be missing the fact that once there are no more redexes, cbv recurses under binders and lazy recurses under binders and into arguments to blocked symbols, so even if the argument passed to b is unexpended when it reaches b, the reduction will go back and expand it once its done reducing everything else. The terms should be the same at the end.

JasonGross commented 9 months ago

(If you're interested in reading more, there's https://github.com/coq/coq/pull/17503#issuecomment-1527519613, but most of that is not relevant here)

JasonGross commented 9 months ago

How such differences shall impede confluence I don't see. My assumption is that such differences lead to issues in unification later, since one of the most difficult questions in unification is to decide which symbols to unfold.

Also not sure if this is the difference of understanding, but in case it is:

note that reduction does not invoke unification at all; unification is only used for filling evars (deciding which symbols to unfold when also matters for conversion--which is basically unification of evar-free terms---but it only matters for performance)
reduction that does not recurse under binders is not confluent, for the reason you identify: if cbv and lazy did not recurse under binders, then blocked symbols would be applied to differently normalized arguments at the end

JasonGross commented 9 months ago

but my understanding of lazy vs cbv is that lazy does not evaluate arguments of functions unless these arguments are actually used in the expansion of a function

And I guess a more accurate characterization here is that lazy does not expand function arguments until either they occur in the head position, as the discriminee of a redex in the head position, or until all such redexes have been evaluated and lazy is recursing into the rest of the term

MSoegtropIMC commented 8 months ago

Regarding the lazy vs. cbv question: I today came across an example where it makes a substantial speed difference - as @JasonGross explained the difference is not at Qed time but during evaluation itself. In some context where a compspecs exists (reptype takes one as implicit argument), I get:

  Time Eval cbv in reptype tvoid.
  Time Eval lazy in reptype tvoid.
=>
     = unit
     : Type
Finished transaction in 0.284 secs (0.283u,0.s) (successful)
     = unit
     : Type
Finished transaction in 0. secs (0.u,0.s) (successful)

I guess this is so because cbv would compute in the complete compspecs, while with lazy evaluation the compsepcs is not evaluated since it is not required to get the representation type of void.

So the comment is wrong, but I there are still good reasons in to prefer lazy over cbv in VST.

andrew-appel commented 8 months ago

Just out of curiosity, I examined this question: are the compspecs fully evaluated already? That is, would doing one extra "eval compute" in the make_compspecs tactic change anything? And the answer is, basically no. In a typical case, "progress compute" will fail to progress. So what is taking the time? Perhaps it is this: within the compspecs there are some (potentially) big proofs, already in normal form. "cbv" walks through these proofs, "lazy" does not need to. Could that be it?

MSoegtropIMC commented 8 months ago

@andrew-appel : yes, this matches what I believe is happening. lazy looks only at what it "needs" to compute the final result, while cbv goes at least once through everything. So especially in structures which are a mix of data and proof terms, lazy should be of advantage.

PrincetonUniversity / VST

Ltac2 local2ptree #712