Fix Verifier memory leak

alefedor commented 1 year ago

Fixed a memory leak between iteration in Lincheck.

The LinCheck Verifier caches scenario results which it already verified to bound computations. However, this cache was kept between different iterations, essentially resulting in a memory leak. The cached values from previous iterations can be useful only if the scenario generated in the current iteration is the same as in one of the previous iterations, which is an unlikely event in practice.

The size of the leak can be large for tests of reasonable size. I managed to get a 500Mb leak for a configuration with 3 threads, 2 operations per thread, 1000 invocations per iterations and 500 iterations.

This PR re-creates linearizability verifier for each iteration to avoid this leak

ndkoval commented 1 year ago

Hi @alefedor, could you please check that this PR does not affect performance and that the source of the memory leak is the cache of the previous results?

In CachedVerifier, a weak hashmap is used to avoid such a leak; once the ExecutionScenario is garbage collected, the corresponding cache is also removed.

    private final Map<ExecutionScenario, Set<ExecutionResult>> previousResults = new WeakHashMap<>();

At the same time, reusing a verifier enables reusing the LTS structure, which improves the performance and produces a memory leak.

alefedor commented 1 year ago

Hi @ndkoval !

You are right. I've seen the correct place of the memory leak but forgot about it. The memory leak is actually not in caching results but here https://github.com/Kotlin/kotlinx-lincheck/blob/1c3fdc0200769ce7b207251a0c299676c185dc21/src/jvm/main/org/jetbrains/kotlinx/lincheck/verifier/LTS.kt#L67

This hashmap strictly increases as the time goes on. As a result, those 500Mb consisted of LTS and LTS$StateInfo.

So, the problem is reusing LTS, which, I suspect, is also not very useful when scenarios differ.

ndkoval commented 1 year ago

It was useful. Could you please check how your change affects the performance?

alefedor commented 1 year ago

@ndkoval Good point. It affects performance.

Baseline: stress (36s), model checking (1m17s) Reset after every iteration: stress (44s), model checking (1m30s) Reset every 5-th iteration: stress (39s), model checking (1m21s)

It is clear that we can not leave this memory leak. Potentially, It can be especially severe when there are non-primitive transformed parameters or values. They can hold the loader, which holds ManagedStateHolder, which holds ManagedStrategy, which holds the whole interleaving tree.

So, do you have good suggestions for how to remove the memory leak without affecting performance much?

I can think of the following heuristics:

Resetting every k-th iteration instead of every iteration. As you can see, resetting verifier every 5-th iteration works.
Removing after every iteration LTS states for large executions as they are unlikely to be reused.
Removing after every iteration a constant fraction of LTS states. It can be, for example, the least popular states. If there were $S$ states and we filtered $(1-p)S$ of them ( $p<1$ ), then the maximum number of states can reach $\frac S p$, which is in a constant number more than the number of states for one iteration

ndkoval commented 1 year ago

We need more thoughtful experiments than one test case to evaluate the performance impact. However, I believe we can create a new verifier once per 100 iterations, and it won't affect most users.

alefedor commented 1 year ago

@ndkoval Please have a look on the new approach (commited). The verifier is:

Renewed every 100 iterations to bound the maximum memory leak (soft anti-leak tactic).
Renewed if less than 10% free memory is remaining to get more free memory (hard anti-leak tactic).

btwilk commented 1 year ago

While I'm happy with a workaround to (https://github.com/Kotlin/kotlinx-lincheck/issues/124), I'm also curious as to whether the root cause for the non-(LTS cache) memory leak involving non-primitive operation parameters can be directly fixed.

ndkoval commented 1 year ago

@alefedor, thanks for investigating the original problem and providing a great-quality fix!

ndkoval commented 1 year ago

Fixes #124

JetBrains / lincheck

Fix Verifier memory leak #128