JetBrains / lincheck

Framework for testing concurrent data structures
Mozilla Public License 2.0
572 stars 33 forks source link

Fix Verifier memory leak #128

Closed alefedor closed 1 year ago

alefedor commented 1 year ago

Fixed a memory leak between iteration in Lincheck.

The LinCheck Verifier caches scenario results which it already verified to bound computations. However, this cache was kept between different iterations, essentially resulting in a memory leak. The cached values from previous iterations can be useful only if the scenario generated in the current iteration is the same as in one of the previous iterations, which is an unlikely event in practice.

The size of the leak can be large for tests of reasonable size. I managed to get a 500Mb leak for a configuration with 3 threads, 2 operations per thread, 1000 invocations per iterations and 500 iterations.

This PR re-creates linearizability verifier for each iteration to avoid this leak

ndkoval commented 1 year ago

Hi @alefedor, could you please check that this PR does not affect performance and that the source of the memory leak is the cache of the previous results?

In CachedVerifier, a weak hashmap is used to avoid such a leak; once the ExecutionScenario is garbage collected, the corresponding cache is also removed.

    private final Map<ExecutionScenario, Set<ExecutionResult>> previousResults = new WeakHashMap<>();

At the same time, reusing a verifier enables reusing the LTS structure, which improves the performance and produces a memory leak.

alefedor commented 1 year ago

Hi @ndkoval !

You are right. I've seen the correct place of the memory leak but forgot about it. The memory leak is actually not in caching results but here https://github.com/Kotlin/kotlinx-lincheck/blob/1c3fdc0200769ce7b207251a0c299676c185dc21/src/jvm/main/org/jetbrains/kotlinx/lincheck/verifier/LTS.kt#L67

This hashmap strictly increases as the time goes on. As a result, those 500Mb consisted of LTS and LTS$StateInfo.

So, the problem is reusing LTS, which, I suspect, is also not very useful when scenarios differ.

ndkoval commented 1 year ago

It was useful. Could you please check how your change affects the performance?

alefedor commented 1 year ago

@ndkoval Good point. It affects performance.

Baseline: stress (36s), model checking (1m17s) Reset after every iteration: stress (44s), model checking (1m30s) Reset every 5-th iteration: stress (39s), model checking (1m21s)

It is clear that we can not leave this memory leak. Potentially, It can be especially severe when there are non-primitive transformed parameters or values. They can hold the loader, which holds ManagedStateHolder, which holds ManagedStrategy, which holds the whole interleaving tree.

So, do you have good suggestions for how to remove the memory leak without affecting performance much?

I can think of the following heuristics:

ndkoval commented 1 year ago

We need more thoughtful experiments than one test case to evaluate the performance impact. However, I believe we can create a new verifier once per 100 iterations, and it won't affect most users.

alefedor commented 1 year ago

@ndkoval Please have a look on the new approach (commited). The verifier is:

  1. Renewed every 100 iterations to bound the maximum memory leak (soft anti-leak tactic).
  2. Renewed if less than 10% free memory is remaining to get more free memory (hard anti-leak tactic).
btwilk commented 1 year ago

While I'm happy with a workaround to (https://github.com/Kotlin/kotlinx-lincheck/issues/124), I'm also curious as to whether the root cause for the non-(LTS cache) memory leak involving non-primitive operation parameters can be directly fixed.

ndkoval commented 1 year ago

@alefedor, thanks for investigating the original problem and providing a great-quality fix!

ndkoval commented 1 year ago

Fixes #124