OCamlPro / alt-ergo

OCamlPro public development repository for Alt-Ergo
https://alt-ergo.ocamlpro.com/
Other
128 stars 33 forks source link

Improve UF implementation #798

Open bclement-ocp opened 11 months ago

bclement-ocp commented 11 months ago

The Union-Find module is implemented with a mapping from semantic values to their representative. This means that accessing the representative is an O(log n) operation, where n is the number of live semantic values (roughly, number of distinct terms).

Benchmarking a few problems reveals that Alt-Ergo can easily spend 5-10% of its time just looking for representatives, and on long-running problems where we discover many terms, this can get upwards of 20%. Using a more efficient implementation of the Union-Find structure could bring tangible performance benefits: for instance, a properly implemented persistent version of Tarjan's Union-Find data structure would bring down the time to access the representatives to (amortized) O(α(n)) i.e. almost constant time.

(Note that these numbers include both the term → semantic value and semantic value → representative mappings, and are thus slightly inflated — in reality, they are probably closer to 3-8% and 15%. An improved union-find representation would only affect the semantic value → representative mapping, not the term → semantic value mapping. However, the term → semantic value mapping is independent from the environment (it always maps t to X.make t, and may no longer be needed at all now that we have the global cache on X.make).

After trying such an implementation, there are a few caveats:

bclement-ocp commented 10 months ago

I have spent some more time implementing a better version of this, but hit another blocker. The crux of it is that Alt-Ergo has many copies of the union-find data structure (up to 8 when using Tableaux-CDCL, although at most 5 are used at once), which is mostly not an issue (although it explains some weird behaviors I was seeing), except for one specific use.

This use is gamma_finite in Theory, which is used for case splits. The way Alt-Ergo uses case splits is as follows.

The issue with this approach is the part where we copy gamma into gamma_finite, because this is hard to support efficiently using a persistent store since there are two "versions" that must be in use at the same time.

The other issue with this approach has nothing to do with persistent union-finds: indeed, it means that most of the theory work that Alt-Ergo does is performed twice — once in gamma, and once in gamma_finite, in similar-but-different environments. It would probably make more sense to integrate the value assignments to the SAT trail, rather than keeping these two copies of the environment, making the whole loop closer to mcSAT (although we currently do now produce precise conflict explanations at the level of theories).

This poses the question of when to make boolean decisions vs theory decisions, and another host of subtle points to consider, so this becomes a more complex task than just changing the union-find implementation. There is one "quick fix" which could be to locally replay the choices when performing a case split and which I will investigate if it is acceptable performance-wise on the current code, which would allow to proceed — although in any case moving these decisions towards the SAT in some way seems like a good idea.