Closed mattam82 closed 7 years ago
Thanks for looking into this! This might partially address #11.
This is great!
I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let
, or something reified somehow in TemplateAst.term
?.
The former method may not result in any advantage because whenever any non-trivial function on Ast.term
executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term
.
I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii
takes 8 words.
Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors.
I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?. The thought was a Gallina let (even possibly a lambda let). The reason that this is beneficial is that it does a little bit more than hash consing. Consider the term:
"abcd" ++ "abcd"
Without hashconsing, this uses a lot of memory. With hashconsing it uses less. However, when this term is type checked, the shared "abcd" is still type checked twice. Using a Gallina let
would make this term be type checked once and referenced twice. There are two ways to get the behavior:
let x := "abcd" in x ++ x
and
(fun x => x ++ x) "abcd"
@mattam82 should correct me if I'm wrong, I believe that the second will be (very slightly) more efficient to type check because Coq can check the body of the function with respect to an opaque value x
whereas with the former it uses a transparent variable x
.
The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term. Agreed. @JasonGross is the only person I know of who has tried to quote a quoted term. This is why it would probably be better to handle this via a flag (but hashconsing should not be flag controlled in the end).
@mattam82 This hash-consing is local to this plugin only. This prevents multiple plugins from sharing the hashconsing cache. Does Coq do internal hashconsing on terms? I had wanted to do this in PluginUtils, but since I made the term construction module into a functor I lost cross-plugin cache sharing.
I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words. Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors. Since
ascii
is also hashconsed in @mattam82's patch, we will likely only ever have ~50ascii
s in memory. That shouldn't be too bad (a packedascii
could be useful but since it wouldn't play nice with anything else, it might be awkward).
If we're worried about type checking time, then we could try to play the let
binding trick above (if it is actually an improvement).
Alternate idea for making quote (quote x)
reasonable: how about an option that adds a bunch of constants to the environment if they don't already exist, like Coq_Init_Logic_eq_identifier := "Coq.Init.Logic.eq"
, etc, so that each use of an identifier is only a constant.
Personally, I don't like this sort of side effecting and relying on names because it makes things fragile in the (unlikely) event that someone writes a definition for one of those names, but sets it equal to something else (or worse, has a different type). In the later case (different type) if we are inserting our terms directly into the kernel (unsure if this is what @mattam82 s patch does), then we would be breaking Coq in a pretty serious way.
I agree, we shouldn't rely on the names. I'm suggesting that, if the name already exists, we check to see if it has the right body (and memoize the check, if it's a bottleneck). If it doesn't, either fail, or generate a new name by appending a digit, or just inline the body in that case.
then we would be breaking Coq in a pretty serious way.
Really? Not to say that this is a good thing, but Coq already admits definitions which don't type check when you run them through cbv
(h/t Andreas Abel's talk):
CoInductive Stream := cons : nat -> Stream -> Stream.
Notation zeros := (cofix zeros : Stream := cons 0 zeros).
Definition force (s : Stream) : Stream := match s with cons x y => cons x y end.
Definition eq_force s : s = force s := match s with cons x y => eq_refl end.
Inductive foo := bar (_ : zeros = force zeros).
Definition eq_zeros := bar (eq_force zeros).
Fail Definition eq_zeros' := Eval cbv in eq_zeros.
(* The command has indeed failed with message:
Illegal application:
The term "bar" of type "zeros = force zeros -> foo"
cannot be applied to the term
"eq_refl" : "cons 0 zeros = cons 0 zeros"
This term has type "cons 0 zeros = cons 0 zeros" which should be coercible to "zeros = force zeros". *)
I've yet to see anyone spin this into a proof of False
(although maybe now that I've said that @maximedenes will come along and prove me wrong).
I think the proper way out of this inefficiency of strings will come in the form of native arrays and ints @maximedenes is working on. Indeed the let-bindings sharing would be too fragile.
I cleaned-up my patch, adapting it to all the Quote vernaculars, rebased and retested against the test-suite and certicoq. Ready to merge now.
Share Strings and Timing.
Share Strings hashconses the string objects (should be improved to do it on separate parts of qualified identifiers, still reduces memory consumption by half on the Gcd example of certicoq) Timing prints timing information about the different phases of quoting, helped me figure out that the externalisation + elaboration phases were used when introducing a quoted term, we can bypass it completely and give the term directly for the kernel to check. Still there is probably some room for improvement, like let-binding the strings somehow (we went from 240s to 24s on CertiCoq's Gcd example).
This PR is still WIP, all quoting functions have to be adapted, and I'm wondering if we should keep the Timing option or not? Incidentally, it fixed a bug in quoting which was using the wrong tInd.