Cache the slow function `update_gets_sets`

Halbaroth commented 2 months ago

The function update_gets_sets of Arrays_rel explores recursively all the terms of the problem appearing in literals of Rel.assumes. This function becomes a bottle neck if we have very large terms as in the issue #1123. For instance, the input file of this issue is solved on next in several minutes but instantly after this patch.

Halbaroth commented 2 months ago

The previous patch contained a bug. I prevented from caching the result of update_gets_sets with its accumulator at the toplevel of the recursive function but I haven't been careful at some recursive calls. This mistake was detected by only one test in our test suite!

I believe the new fix is correct but we should check it a last time on Marvin.

Halbaroth commented 2 months ago

+0-0 on both ae-format on QF_AX :)

Halbaroth commented 1 month ago

@bclement-ocp ping?

Halbaroth commented 1 month ago

Oof I missed when skimming that code that we were doing the recursion on terms and not semantic values.

If I am not mistaken, we don't need the cache to be global and could store the table in the environment here? Since it is just a cache it is OK if it is mutable state inside the environment, and this would avoid breaking Alt-Ergo-Fuzz (in addition to avoiding more global state).

Using a cache per environment seems a bit weird for me. I agree that a global cache is ugly but at least the environment of theories contains only information about terms of the current problem. The purpose of this cache is to store old relevant terms discovered while exploring some branches with the SAT solver and these terms aren't necessary relevant in the current branch.

We currently stores global caches in IntervalCalculus and we have reinit_ctx in SAT solvers to reinitialize them. If we use local caches, it's not obvious where and when we should reinitialize these caches.

hra687261 commented 1 month ago

(IMO: adding a way to reinitialize the cache (And calling from the SAT solvers) should also work)

bclement-ocp commented 1 month ago

Using a cache per environment seems a bit weird for me. I agree that a global cache is ugly but at least the environment of theories contains only information about terms of the current problem. The purpose of this cache is to store old relevant terms discovered while exploring some branches with the SAT solver and these terms aren't necessary relevant in the current branch.

As long as we do not ask "is this key in the cache" (I think we can't do this with the ephemeron API anyways), this is not a problem. As an aside, the fact that "the current problem" is a thing that exists as a global state is something we are trying to move away (see #377).

The ideal solution would be to only cache the values for the current problem — this would require removing entries from the table as we backtrack, and we don't currently have a mechanism for that.

We currently stores global caches in IntervalCalculus and we have reinit_ctx in SAT solvers to reinitialize them. If we use local caches, it's not obvious where and when we should reinitialize these caches.

The point of using local caches is that we do not have to reinitialize them. We need to reinitialize the global caches because they are global state and hence a previous run can affect a subsequent run — but since each run has its own environment, using local caches will not have this issue.

(FTR, I am fine with using context reinitialization hooks as this is what we currently do elsewhere; storing the info on the local environment is easier to do & simpler to migrate to a backtracking table when we get the ability to do so hence why I suggested it)

Halbaroth commented 1 month ago

As long as we do not ask "is this key in the cache" (I think we can't do this with the ephemeron API anyways), this is not a problem. As an aside, the fact that "the current problem" is a thing that exists as a global state is something we are trying to move away (see #377).

I know using a global state isn't the best design. I didn't know I don't need to flush the cache with your solution.

The ideal solution would be to only cache the values for the current problem — this would require removing entries from the table as we backtrack, and we don't currently have a mechanism for that.

Not sure to understand your point here. By current problem, I meant the current state of the SAT. Let's image we have a formula A \/ B and we have explored A and now we explore B after backtracking. Without cache or with cache that follows the backtracking mechanism of the SAT, we have to explore completely all terms of Beven if we have explored some of them in A. Actually the new_terms field of Arrays_rel already contains all the relevant terms of the current branch and we can use it as a cache (we replace it by a functional map). It may be sufficient for the submitted input file. I will test.

I thought a bit about this problem last week and I think that an acceptable design consists in attaching relevant terms to the semantic values (or terms?) directly. So the cache we use for X.make would store relevant terms too. I didn't implement this solution because I'm not sure there is no better ways and I prefer a minimal patch before the next release.

Halbaroth commented 1 month ago

I replaced the global cache by a local cache. Still working with the input file of #1123 but I can run again benchmarks?

Halbaroth commented 1 month ago

+0-0 on ae-format again :)

OCamlPro / alt-ergo

Cache the slow function `update_gets_sets` #1126