Drastic performance improvement and two new options (WIP!)

mattam82 commented 7 years ago

Share Strings and Timing.

Share Strings hashconses the string objects (should be improved to do it on separate parts of qualified identifiers, still reduces memory consumption by half on the Gcd example of certicoq) Timing prints timing information about the different phases of quoting, helped me figure out that the externalisation + elaboration phases were used when introducing a quoted term, we can bypass it completely and give the term directly for the kernel to check. Still there is probably some room for improvement, like let-binding the strings somehow (we went from 240s to 24s on CertiCoq's Gcd example).

This PR is still WIP, all quoting functions have to be adapted, and I'm wondering if we should keep the Timing option or not? Incidentally, it fixed a bug in quoting which was using the wrong tInd.

gmalecha commented 7 years ago

Thanks for looking into this! This might partially address #11.

aa755 commented 7 years ago

This is great! I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?.

The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term.

I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words. Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors.

gmalecha commented 7 years ago

I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?. The thought was a Gallina let (even possibly a lambda let). The reason that this is beneficial is that it does a little bit more than hash consing. Consider the term:

"abcd" ++ "abcd"

Without hashconsing, this uses a lot of memory. With hashconsing it uses less. However, when this term is type checked, the shared "abcd" is still type checked twice. Using a Gallina let would make this term be type checked once and referenced twice. There are two ways to get the behavior:

let x := "abcd" in x ++ x

and

(fun x => x ++ x) "abcd"

@mattam82 should correct me if I'm wrong, I believe that the second will be (very slightly) more efficient to type check because Coq can check the body of the function with respect to an opaque value x whereas with the former it uses a transparent variable x.

The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term. Agreed. @JasonGross is the only person I know of who has tried to quote a quoted term. This is why it would probably be better to handle this via a flag (but hashconsing should not be flag controlled in the end).

@mattam82 This hash-consing is local to this plugin only. This prevents multiple plugins from sharing the hashconsing cache. Does Coq do internal hashconsing on terms? I had wanted to do this in PluginUtils, but since I made the term construction module into a functor I lost cross-plugin cache sharing.

I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words. Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors. Since ascii is also hashconsed in @mattam82's patch, we will likely only ever have ~50 asciis in memory. That shouldn't be too bad (a packed ascii could be useful but since it wouldn't play nice with anything else, it might be awkward).

If we're worried about type checking time, then we could try to play the let binding trick above (if it is actually an improvement).

JasonGross commented 7 years ago

Alternate idea for making quote (quote x) reasonable: how about an option that adds a bunch of constants to the environment if they don't already exist, like Coq_Init_Logic_eq_identifier := "Coq.Init.Logic.eq", etc, so that each use of an identifier is only a constant.

gmalecha commented 7 years ago

Personally, I don't like this sort of side effecting and relying on names because it makes things fragile in the (unlikely) event that someone writes a definition for one of those names, but sets it equal to something else (or worse, has a different type). In the later case (different type) if we are inserting our terms directly into the kernel (unsure if this is what @mattam82 s patch does), then we would be breaking Coq in a pretty serious way.

JasonGross commented 7 years ago

I agree, we shouldn't rely on the names. I'm suggesting that, if the name already exists, we check to see if it has the right body (and memoize the check, if it's a bottleneck). If it doesn't, either fail, or generate a new name by appending a digit, or just inline the body in that case.

JasonGross commented 7 years ago

then we would be breaking Coq in a pretty serious way.

Really? Not to say that this is a good thing, but Coq already admits definitions which don't type check when you run them through cbv (h/t Andreas Abel's talk):

CoInductive Stream := cons : nat -> Stream -> Stream.
Notation zeros := (cofix zeros  : Stream := cons 0 zeros).
Definition force (s : Stream) : Stream := match s with cons x y => cons x y end.
Definition eq_force s : s = force s := match s with cons x y => eq_refl end.
Inductive foo := bar (_ : zeros = force zeros).
Definition eq_zeros := bar (eq_force zeros).
Fail Definition eq_zeros' := Eval cbv in eq_zeros.
(* The command has indeed failed with message:
Illegal application:
The term "bar" of type "zeros = force zeros -> foo"
cannot be applied to the term
 "eq_refl" : "cons 0 zeros = cons 0 zeros"
This term has type "cons 0 zeros = cons 0 zeros" which should be coercible to "zeros = force zeros". *)

I've yet to see anyone spin this into a proof of False (although maybe now that I've said that @maximedenes will come along and prove me wrong).

mattam82 commented 7 years ago

I think the proper way out of this inefficiency of strings will come in the form of native arrays and ints @maximedenes is working on. Indeed the let-bindings sharing would be too fragile.

mattam82 commented 7 years ago

I cleaned-up my patch, adapting it to all the Quote vernaculars, rebased and retested against the test-suite and certicoq. Ready to merge now.

gmalecha / template-coq

Drastic performance improvement and two new options (WIP!) #35