Open MSoegtropIMC opened 10 months ago
One note: I am using lazy
instead of cbv
here because it can be faster at Qed times - at tactic time it should not make much of a difference.
I suggest: Add a brief comment just before the Std.lazy line (if that's the right place?), saying ( use lazy instead of cbv because Qed can be faster )
I suggest, at line 71 augment your comment with, "Normally this makes a significant but small improvement in performance; but when there are many local variables (e.g. 60) it makes a huge difference."
One note: I am using
lazy
instead ofcbv
here because it can be faster at Qed times - at tactic time it should not make much of a difference.
I don't understand this comment. The difference between lazy
and cbv
is that lazy
uses the same algorithm for reduction that Qed / the kernel uses, while cbv
uses a different algorithm. But both tactics leave exactly the same trace in the proof term (a cast node with, I think, the post-reduction term (though it may be the pre-reduction term, I don't recall)), and regardless of which one you do, Qed / the kernel will always use the lazy algorithm. So any difference you see between lazy
and cbv
at Qed
-time should just be noise.
Regarding lazy
vs. cbv
: my (not very elaborate) understanding of this is that using lazy
vs cbv
doesn't make a difference if both deliver the same results, but it can make a substantial difference if the results are different, because then the result of lazy
should be closer to the result of the kernel and require less unification effort. For partial evaluation with a restricted symbol list as done here, I believe it can well be that lazy and cbv give different results. I have difficulties providing examples for this, though, since lazy is less lazy that I would expect.
Anyway at some point I started to prefer lazy, but I can't exclude that this was caused by doing computations directly in hypothesis, which (afaik) doesn't leave an annotation.
For partial evaluation with a restricted symbol list as done here, I believe it can well be that lazy and cbv give different results
If such an example exists, it's a bug in Coq. The difference between lazy and cbv is evaluation ordering and sharing of reductions, but so long as Coq reduction is confluent (which it is widely believed to be, I think), they cannot give different results.
so long as Coq reduction is confluent (which it is widely believed to be, I think), they cannot give different results.
Isn't it so that the argument of confluence only applies to full evaluation? We are doing partial evaluation here (with restricted delta lists). I mean hnf
and cbv
also give different results without impeding confluence. I didn't look into lazy into too much detail, and as I said in tests it is less lazy then I would expect, but my understanding of lazy vs cbv is that lazy does not evaluate arguments of functions unless these arguments are actually used in the expansion of a function. If the evaluation of a function a
is partially blocked say because it contains a function b
which is not in the delta list, and if an argument of a
is passed to b
, then cbv
should expand the argument, while lazy should not. So lazy should result in a term where the argument of a
passed to b
is not expanded, while cbv should deliver the same term with the argument to b
expanded. How such differences shall impede confluence I don't see.
My assumption is that such differences lead to issues in unification later, since one of the most difficult questions in unification is to decide which symbols to unfold.
I tried to make some trivial examples, but failed spending just a few minutes. If you believe my understanding of this is substantially flawed, I will spend some more time trying to understand this in more detail.
Isn't it so that the argument of confluence only applies to full evaluation? We are doing partial evaluation here (with restricted delta lists).
Imagine wrapping the entire environment in a lambda abstracting over the constants to be left behind. If full reduction is confluent even under lambdas it should be confluent even with restricted delta lists. Less sure about leaving out the other reductions, though I think most of them can be modeled by replacing certain constructs with opaque applications.
I mean
hnf
andcbv
also give different results without impeding confluence
hnf
is not strong, it does not recurse under binders when the head is normal. cbv
and lazy
do. Moreover hnf
uses the simpl
engine under the hood, performing refolding, so it's really not analogous. My claim applies specifically to lazy
and cbv
. Even VM and native can give different results, though only in two places: alpha renaming of lambda variables and in not normalizing the parameters of inductive constructors.
If the evaluation of a function
a
is partially blocked say because it contains a functionb
which is not in the delta list, and if an argument ofa
is passed tob
, thencbv
should expand the argument, while lazy should not. So lazy should result in a term where the argument ofa
passed tob
is not expanded, while cbv should deliver the same term with the argument tob
expanded.
You might be missing the fact that once there are no more redexes, cbv
recurses under binders and lazy
recurses under binders and into arguments to blocked symbols, so even if the argument passed to b
is unexpended when it reaches b
, the reduction will go back and expand it once its done reducing everything else. The terms should be the same at the end.
(If you're interested in reading more, there's https://github.com/coq/coq/pull/17503#issuecomment-1527519613, but most of that is not relevant here)
How such differences shall impede confluence I don't see. My assumption is that such differences lead to issues in unification later, since one of the most difficult questions in unification is to decide which symbols to unfold.
Also not sure if this is the difference of understanding, but in case it is:
but my understanding of lazy vs cbv is that lazy does not evaluate arguments of functions unless these arguments are actually used in the expansion of a function
And I guess a more accurate characterization here is that lazy
does not expand function arguments until either they occur in the head position, as the discriminee of a redex in the head position, or until all such redexes have been evaluated and lazy
is recursing into the rest of the term
Regarding the lazy
vs. cbv
question: I today came across an example where it makes a substantial speed difference - as @JasonGross explained the difference is not at Qed time but during evaluation itself. In some context where a compspecs
exists (reptype takes one as implicit argument), I get:
Time Eval cbv in reptype tvoid.
Time Eval lazy in reptype tvoid.
=>
= unit
: Type
Finished transaction in 0.284 secs (0.283u,0.s) (successful)
= unit
: Type
Finished transaction in 0. secs (0.u,0.s) (successful)
I guess this is so because cbv would compute in the complete compspecs, while with lazy evaluation the compsepcs is not evaluated since it is not required to get the representation type of void.
So the comment is wrong, but I there are still good reasons in to prefer lazy over cbv in VST.
Just out of curiosity, I examined this question: are the compspecs fully evaluated already? That is, would doing one extra "eval compute" in the make_compspecs tactic change anything? And the answer is, basically no. In a typical case, "progress compute" will fail to progress. So what is taking the time? Perhaps it is this: within the compspecs there are some (potentially) big proofs, already in normal form. "cbv" walks through these proofs, "lazy" does not need to. Could that be it?
@andrew-appel : yes, this matches what I believe is happening. lazy looks only at what it "needs" to compute the final result, while cbv goes at least once through everything. So especially in structures which are a mix of data and proof terms, lazy should be of advantage.
This PR replaces local2ptree with a version written in Ltac2 which is much faster (in measurements about 100x for a case with about 60 local variables).
The difference in
make test
is anyway quite small (1..2%) - the time of local2ptree is more visible in developments with many local variables.Here are the
make test
times for the old and new version.