coq / coq

Coq is a formal proof management system. It provides a formal language to write mathematical definitions, executable algorithms and theorems together with an environment for semi-interactive development of machine-checked proofs.
https://coq.inria.fr/
GNU Lesser General Public License v2.1
4.82k stars 645 forks source link

In coqc, discard goal/hypothesis info for resolved evars that can no longer be backtracked #13577

Open jfehrle opened 3 years ago

jfehrle commented 3 years ago

I expect that once an evar has been resolved and it's no longer possible to backtrack and change the resolved value, the goal and hypothesis info can be discarded since they're no longer needed. Only the resolved value is needed.

If there are points in the proof script that cannot backtrack, this information can be discarded to save memory. Maybe this can be automatic in some cases, such as after each sentence in the proof script. In others, the author of the proof script may know points at which backtracking is no longer expected, which could be designated by some tactic in the script. In #12487, I measured 356Mwords of memory in use for 95873 "defn_evars" in mid-proof, which I assume are mostly for solved evars, whereas the completed proof was only 39MWords--a very large difference.

@JasonGross's tells me:

I think having a variant of once which discards backtracking within it and also garbage collects evars, or something like that, might be quite useful. I suspect https://github.com/coq/coq/issues/13576 will show an even larger blowup for something like this, and I think solving the performance bottlenecks in https://github.com/coq/coq/issues/13576 would be quite useful.

Attn: @ppedrot

ppedrot commented 3 years ago

Yes, this is essentially like calling Optimize Proof at every dot. It was tried (by @SkySkimmer I think?) and was very bad for performance.

jfehrle commented 3 years ago

Maybe it depends on how it's implemented? If you examine all of the 95873 evars at every dot, that would be a lot more overhead (O(dots*evars)) than if you can examine/process only the evars that still have the extra info (O(evars)). Implemented with a somewhat different data structure from what we have now. I expect dropping the info for a single evar is very, very quick.

SkySkimmer commented 3 years ago

Is there a reason we keep the evar_info for defined evars? ie instead of defn_evars : evar_info EvMap.t why not have defn_evars : constr EvMap.t?

SkySkimmer commented 3 years ago

I guess we also need to keep the id list for the context around somehow too.

ppedrot commented 3 years ago

Currently I don't see how to get rid of this huge data in a safe way. It occurred to me that we could change the evar substitution in a way that does not require knowledge of the evar map to perform substitution though, namely by replacing constr list with constr Id.Map.t. This would have a slightly bigger cost for small substitutions, but we would get the best of it for large mostly-identity substitutions. Variable substitution would be registered there, with a potential overhead if we keep track of variables not appearing in the evar body.

jfehrle commented 3 years ago

Ah, do you mean getting rid of the data in way that's both safe AND fast?

ppedrot commented 3 years ago

If you drop the evar info, if ever there is some term out there that still mentions the evar, you won't be able to expand the evar anymore. Since it's virtually impossible to keep track of evar dependencies in an efficient way, it means that, even making the assumptions that there are not terms reachable via tactic closures, one has to first crawl the whole evar map to normalize this evar everywhere, effectively performing the same job as optimize proof. This is terribly inefficient, and unsound when you take tactics into account. We actually need a real GC, since evarmaps are fundamentally heaps.

If we tweak the evar instance representation, there is hope though. We can mandate that it's UB to access the evarinfo of a defined evar, and since under these assumptions the only relevant data one needs for defined evars is the list of names of the context, if this is stored in the instance itself, problem solved.

jfehrle commented 3 years ago

Would it help to use weak references so that OCaml GC releases the evar data when it's no longer needed? That might, in some places, require referencing the evar data directly in addition to giving the evar id.

What is "UB"?

ppedrot commented 3 years ago

Would it help to use weak references so that OCaml GC releases the evar data when it's no longer needed?

I thought about that, but the problem is that you need to be able to serialize evarmaps for par: and vio file compilation. Even without taking that into account, it'd be non-trivial to provide a persistent API that is sound in this way.

What is "UB"?

Undefined behaviour, i.e. if you do this you can get demons flying out of your nose.

jfehrle commented 3 years ago

FWIW, I expect it would be possible to patch up deserialized evar maps so the references are properly shared.