Addressing Context memory leakage in Java

arifogel commented 9 years ago

What is the status of Context cleanup in Java? Batfish (github/arifogel/batfish) uses JNI to spawn threads in parallel for an unlimited number of jobs (75 in my most recent use case), and it looks like these Contexts are not cleaning up after themselves after calling Context.dispose(). I even tried using -XX:+UseSerialGC since you seemed to imply it was necessary in issue #205 for soundness, but this doesn't seem to affect the garbage that never gets cleaned up when each job finishes. The process just ends up crashing after available memory is all allocated (or earlier if too much time is spent in GC). As such, the only way to run Batfish's Z3 analysis in a single process for all these jobs is to have memory that scales with the total number of jobs, which is prohibitively expensive and unnecessary. I can try to rewrite my program to use multiple processes rather than threads so the memory is guaranteed to be freed, but that is a terrible hack and will require IPC I'd rather not implement.

wintersteiger commented 9 years ago

The status is as described in #205. We may need to implement our own finalization queue to keep track of which Contexts are ready for disposal and which ones aren't. You can try adding some output here to see if your Context objects do indeed not get cleaned up because Java attempts to finalize it before all dependant objects are finalized. If this doesn't get triggered, then your problem is a different one.

Does -XX:+UseSerialGC only imply a single-thread GC, or does it also imply finalization in reverse order of creation? If it's the latter then you're seeing a different, possibly unrelated issue/leak.

arifogel commented 9 years ago

To answer the question at the end of your last comment, here is the diff of the jvm flags with and without -XX:+UseSerialGC:

arifogel@thedatamatrix:~$diff java-flags.withoutserialgc.txt java-flags.withserialgc.txt
287c287
< uintx MarkSweepDeadRatio = 1 {product}
---
> uintx MarkSweepDeadRatio = 5 {product}
294c294
< uintx MaxHeapFreeRatio = 100 {manageable}
---
> uintx MaxHeapFreeRatio = 70 {manageable}
316c316
< uintx MinHeapFreeRatio = 0 {manageable}
---
> uintx MinHeapFreeRatio = 40 {manageable}
365c365
< uintx ParallelGCThreads = 10 {product}
---
> uintx ParallelGCThreads = 0 {product}
388c388
< uintx PermMarkSweepDeadRatio = 5 {product}
---
> uintx PermMarkSweepDeadRatio = 20 {product}
632,633c632,633
< bool UseParallelGC := true {product}
< bool UseParallelOldGC = true {product}
---
> bool UseParallelGC = false {product}
> bool UseParallelOldGC = false {product}
640c640
< bool UseSerialGC = false {product}
---
> bool UseSerialGC := true {product}

arifogel commented 9 years ago

OK I added instrumentation to that function. I print m_refCount before and after the call to dispose, and have a print statement saying m_refCount != 0 in the else branch you indicated. I am not manually calling Context.dispose() anymore. For reference, the Context object should be out of scope after a job terminates. Here are my results with 8 jobs executing in parallel with concurrent GC (I terminated it this time before it ran out of memory):

* EXECUTING NOD JOBS * Job terminated successfully with result: org.batfish.z3.NodJobResult@73e20f87 after elapsed time: 00:01:1.58 - 1/75 (1.3%) complete before dispose, m_refCount == 323507 Job terminated successfully with result: org.batfish.z3.NodJobResult@616a37dc after elapsed time: 00:01:2.03 - 2/75 (2.7%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@547d573 after elapsed time: 00:01:1.51 - 3/75 (4.0%) complete after dispose, m_refCount == 323507 m_refCount != 0 before dispose, m_refCount == 120 after dispose, m_refCount == 120 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@4260904e after elapsed time: 00:01:9.30 - 4/75 (5.3%) complete before dispose, m_refCount == -491 after dispose, m_refCount == -491 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@16ecb69e after elapsed time: 00:01:12.83 - 5/75 (6.7%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@1cb2d75c after elapsed time: 00:01:13.50 - 6/75 (8.0%) complete before dispose, m_refCount == 132528 after dispose, m_refCount == 132528 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@18893021 after elapsed time: 00:01:24.12 - 7/75 (9.3%) complete before dispose, m_refCount == 192730 after dispose, m_refCount == 192730 m_refCount != 0 before dispose, m_refCount == 82584 after dispose, m_refCount == 82584 m_refCount != 0 before dispose, m_refCount == -105 after dispose, m_refCount == -105 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@163fa023 after elapsed time: 00:00:52.68 - 8/75 (10.7%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@7805cbba after elapsed time: 00:01:4.24 - 9/75 (12.0%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@290163e1 after elapsed time: 00:00:51.84 - 10/75 (13.3%) complete before dispose, m_refCount == 96220 after dispose, m_refCount == 96220 m_refCount != 0 before dispose, m_refCount == 680 after dispose, m_refCount == 680 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@65409285 after elapsed time: 00:01:14.32 - 11/75 (14.7%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@a1193a9 after elapsed time: 00:01:13.34 - 12/75 (16.0%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@75baec8e after elapsed time: 00:01:11.30 - 13/75 (17.3%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@b9f370 after elapsed time: 00:01:0.26 - 14/75 (18.7%) complete before dispose, m_refCount == -274 after dispose, m_refCount == -274 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@2707f992 after elapsed time: 00:01:2.22 - 15/75 (20.0%) complete before dispose, m_refCount == 697 after dispose, m_refCount == 697 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@137a9034 after elapsed time: 00:01:9.51 - 16/75 (21.3%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@322c2f49 after elapsed time: 00:00:55.44 - 17/75 (22.7%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@757b7533 after elapsed time: 00:00:46.89 - 18/75 (24.0%) complete before dispose, m_refCount == 390 after dispose, m_refCount == 390 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@65c95a57 after elapsed time: 00:01:17.95 - 19/75 (25.3%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@50bbaeee after elapsed time: 00:01:7.80 - 20/75 (26.7%) complete before dispose, m_refCount == 2027 after dispose, m_refCount == 2027 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@1943fd4d after elapsed time: 00:03:37.47 - 21/75 (28.0%) complete before dispose, m_refCount == 564 after dispose, m_refCount == 564 m_refCount != 0 before dispose, m_refCount == -420 after dispose, m_refCount == -420 m_refCount != 0 before dispose, m_refCount == 1448 after dispose, m_refCount == 1448 m_refCount != 0 before dispose, m_refCount == 1182 after dispose, m_refCount == 1182 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@ce82ec9 after elapsed time: 00:00:58.29 - 22/75 (29.3%) complete before dispose, m_refCount == 163382 after dispose, m_refCount == 163382 m_refCount != 0 before dispose, m_refCount == 3800 ^C^C^C^C^C^C^C^C^C^C^C^C^C^C^C^Cafter dispose, m_refCount == 3800 m_refCount != 0 Job terminated successfully with result: org.batfish.z3.NodJobResult@7c5cc270 after elapsed time: 00:01:2.99 - 23/75 (30.7%) complete

^C^C^CJob terminated successfully with result: org.batfish.z3.NodJobResult@573757ec after elapsed time: 00:01:2.23 - 24/75 (32.0%) complete Job terminated successfully with result: org.batfish.z3.NodJobResult@74e3391b after elapsed time: 00:00:36.32 - 25/75 (33.3%) complete

wintersteiger commented 9 years ago

Alright, I committed fixes for the non-Interlocked inc/dec problem and potential super.finalize() problems (fixes are in master now).

Also, I (finally!) created a test program that predictably triggers some of the remaining issues, see Gh245.java. This triggers the m_refCount != 0 upon finalization problem and it also triggers ast_manager memory leaks rarely and at least on Windows (we can ignore those for the purposes of this issue)). (+ also reproduces on OSX.)

Another "typical" concurrency problem may be that the creation of Context objects is done in parallel; in the test program I serialized those:

synchronized (m_lock) { 
    ctx = new Context(); 
}

At the moment this is definitely required, but I'm thinking about adding a similar lock directly to the Context constructor as there is basically no impact for single-context users and multi-context users need to do that anyways.

wintersteiger commented 9 years ago

Instead of just talking about it, I figured I should just write that fix down, so now concurrent Context creation should be safe in Java and .NET.

wintersteiger commented 8 years ago

Is this still an issue or is the memory being managed well enough for now? I remember that the last few changes for this problem were defensive, i.e., safe but potentially slow. Do you see any of those negative effects?

arifogel commented 8 years ago

I'm afraid I haven't been using z3 lately, but to my recollection, performance is fine.

On December 3, 2015 5:43:22 AM PST, "Christoph M. Wintersteiger" notifications@github.com wrote:

Is this still an issue or is the memory being managed well enough for now? I remember that the last few changes for this problem were defensive, i.e., safe but potentially slow. Do you see any of those negative effects?

Reply to this email directly or view it on GitHub: https://github.com/Z3Prover/z3/issues/245#issuecomment-161643067

arifogel commented 8 years ago

I'd like to reopen this issue. Here is my use case: I need to run ~5000 jobs in batfish using 12 parallel Z3 worker threads at a time. I have 64GiB of RAM on my machine, and have set the -Xmx60g jvm flag. The output of each job is stored in memory and not garbage collected, but the output is relatively small. I believe the saved output from each job is <1KiB. When I run the whole thing at once as described, batfish crashes with an out-of-memory exception after several hours with full CPU utilization on all 12 cores.

To work around this, I partitioned the jobs into 13 more or less equally-sized chunks of jobs, to be run separately. Each chunk was run in a separate process, also with 12 parallel Z3 worker threads.. The chunks were run one after the other. All chunks were completed after only 20 minutes.

I assume that the crashing case took much longer for 2 reasons: (1) Instead of just instantly freeing the memory associated with a chunk of several hundred jobs when the process ended, the memory had to be meticulously freed by the garbage collector. (2) Old contexts leaving uncollectable garbage eventually reduced the maximum available free memory for new jobs, forcing more and more frequent garbage collection.

We can potentially address (1) by reducing the complexity of garbage collection, e.g. by using PhantomReferences instead of finalizers as alluded to in #442 by cheshire. I believe that such a solution may also alleviate (2) if engineered properly. However, I do not know enough about the difference between finalizers and PhantomReferences to say so authoritatively.

cheshire commented 8 years ago

@arifogel in my applications ironically I've found the best combination (by far) not to do garbage collection at all, as the overhead associated with calling "finalize" is so ENORMOUS is that it far outweighs any benefit provided by garbage collecting the formulas.

If you need better performance now and you don't mind using a wrapper, you can try using our JavaSMT library, which requires closing contexts manually, and gives an option on whether to use phantom references or nothing for keeping track of formulas.

cheshire commented 8 years ago

I would be curious to test this now, after I've changed the memory handling in Java bindings.

arifogel commented 8 years ago

@cheshire: Do you mean you changed it in your JavaSMT library? If so, I'm not sure that will work for my use case. I need to be able to use the Network-Optimized Datalog (NoD) plugin of Z3. Do you support that in your library?

NikolajBjorner commented 8 years ago

It applies to all z3 functionality also nod

arifogel commented 8 years ago

I see. But to clarify, does this mean I should use Z3, or JavaSMT (sosy-lab/java-smt?)?

cheshire commented 8 years ago

https://github.com/Z3Prover/z3/commit/3e96a7972f776f1a74df31a987a244b8fbfb388e aims to address the memory leak. Be aware that now the context should be manually closed (eg using try-with-resources) On Jun 27, 2016 9:48 PM, "Ari Fogel" notifications@github.com wrote:

I see. But to clarify, does this mean I should use Z3, or JavaSMT (sosy-lab/java-smt?)?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Z3Prover/z3/issues/245#issuecomment-228854413, or mute the thread https://github.com/notifications/unsubscribe/AAVTH2T1LdYkHM6Ij7YGJbnrVT2TApBgks5qQCjwgaJpZM4GOB_V .

arifogel commented 8 years ago

I rebuilt the latest Z3, and batfish no longer compiled because of references to Context.dispose which no longer exists. So I changed all references to new Context() to use try( Context ctx = new Context() {...}, and removed calls to dispose. Unfortunately, I see no difference in any of the behavior I have described in this thread; memory usage steadily rises until the system runs out of memory and kills the batfish java process. EDIT: I should note that I tested this both with and without the -Xmx60g java flag, and that I am now using JDK 8 in Ubuntu 16.04, whereas I was previously using JDK 7 in Ubuntu 14.04. For all current runs, it does not appear that the -Xmx60g flag is respected or necessary; the program uses well more than 60GB of memory (all available memory) and then crashes.

arifogel commented 8 years ago

Perhaps there is something wrong with the way I am invoking Z3? For reference, you can look at: https://github.com/arifogel/batfish/blob/master/projects/batfish/src/org/batfish/z3/NodSatJob.java

cheshire commented 8 years ago

@arifogel =( do you think you could use jProfiler (https://www.ej-technologies.com/products/jprofiler/overview.html, it has a 30 days evaluation period) or some other profiling tool (visualvm is free, but the results are often suboptimal) and see where the memory is actually spent? [of course you won't see what's happening in the native code, but we could get some hint]

arifogel commented 8 years ago

I used JProfiler. The memory use goes up a little bit as time goes on, but stays more or less constant. It never uses more than about 11.5GB, and periodically drops down to about 3GB. However, the memory reported used by the java process as reported by htop goes up at a constant rate, until it uses all 64GB of my memory and crashes. I have a very strong feeling that the native code has a memory leak. What is the next step?

arifogel commented 8 years ago

Here is some of the profiling data for reference: profiling.zip

NikolajBjorner commented 8 years ago

So you have about 14M java.lang.ref.PhantomReference objects around. As far as I understand these are references to Z3 expressions (6M+ of which are Boolean expressions). "Someone" would have to tell the GC to collect these as the context object would not go away unless the expressions that are owned by the context go away. To my understanding there are two possible approaches: (1) Batfish keeps track of the expressions it has created and ensures the dispose/finalize methods are called by the scope exit (2) the Java runtime uses a double indirection to the context object instead of the single indirection. When the context is disposed, the set of allocated reference counts to ast objects are decremented at the time the context is finalized. Dangling expression objects are no longer valid as the underlying context pointer is now set to null. @cheshire what is your take?

cheshire commented 8 years ago

@NikolajBjorner I'm sorry I'm not sure I understand your comment fully.

There is indeed a leak in current Java bindings: when closing contexts c we do clean reference queues belonging to c, yet not all objects in c might be GC'd yet at that point (in that case, the reference would not appear in the reference queue, and would not be cleaned). We could call System.gc() but I'm not sure it is guaranteed to actually do what we want.

This is easily fixable: when closing the context, instead of going through the reference queue, we can simply remove all objects stored in referenceMap. I would submit a pull request shortly. However,

1) In that case the client should be responsible for not storing references to objects belonging to context outside of that context, as any operation on them after the context was closed (e.g. converting to a string or equality checking) would segfault. 2) I think this is a very tiny, local, leak which only affects objects which were already unreachable but not GC'd when the context was closed. Unless there are thousands of contexts created and closed all the time, I would not expect this to cause any actual problems. This hypothesis is consistent with 14M number occupying 470MB out of overall 64GB. So currently I agree with @arifogel that there's probably a leak in the native code.

However, in the case I'm wrong with (2), let us try to use the fixed version first, and then investigate other options.

NikolajBjorner commented 8 years ago

thanks, this would be rather good to try.

cheshire commented 8 years ago

@arifogel Can you try my Z3 branch? (https://github.com/cheshire/z3/tree/fix_java_leak) Also when load-testing multiple contexts I've found out that Z3 slows down to a crawl, and running same queries serially on a single thread is often much faster (at least on my benchmarks). What happens to your performance if you force only one thread to execute at a time?

arifogel commented 8 years ago

You want me to run everything serially with a single context? Or just avoid having multiple contexts open at once? The former case would almost certainly use too much memory. The latter case will definitely run more slowly than running in parallel in my experience. However, I can test this with your branch and the current master if you wish. On Jun 29, 2016 4:31 AM, "George Karpenkov" notifications@github.com wrote:

@arifogel https://github.com/arifogel Can you try my Z3 branch? ( https://github.com/cheshire/z3/tree/fix_java_leak) Also when load-testing multiple contexts I've found out that Z3 slows down to a crawl, and running same queries serially on a single thread is often much faster (at least on my benchmarks). What happens to your performance if you force only one thread to execute at a time?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Z3Prover/z3/issues/245#issuecomment-229330117, or mute the thread https://github.com/notifications/unsubscribe/AHYSEkrZK8i-0JnTAOGUq0yvNejM8Accks5qQld1gaJpZM4GOB_V .

cheshire commented 8 years ago

@arifogel My branch is the same as master now, so re-testing with master is sufficient.

arifogel commented 8 years ago

OK so now it actually terminates. I am attaching profiling information. At about 90 minutes in (marked at the top of ps.out), I started periodically recording the total virtual and ~~shared~~ resident memory allocated to the process. For the reference/memory count per-class, I'm not actually sure when that snapshot was taken. profile-new.zip

arifogel commented 8 years ago

I am currently running without profiling, and I estimate it will take about 3 hours and 20 minutes to complete. I should note that both in the profiling case and non-profiling case it appears that memory usage steadily increases significantly beyond that used for Java objects. I wonder if this still constitutes a memory leak and/or is related to the terrible performance relative to running partitions sequentially with each given run conducted in parallel (takes about 20 minutes for everything).

NikolajBjorner commented 8 years ago

So your profile indicates that a lot of objects are left uncollected of the type: org.batfish.z3.node.VarIntExpr. These are of course batfish objects. The NodJob.java code does not return any such objects as far as I can see, but there are maybe some container objects that don't get collected sufficiently eagerly. Did you try to force a GC before returning from NodJob? Alternatively dispose of objects that possibly contain references to these objects.

arifogel commented 8 years ago

The varintexpr objects' total size is bounded throughout the computation. It is negligible compared to the unaccounted for memory usage that is non-jvm-allocated. As I said in an earlier post, I cannot account for the time of the snapshot you are referencing. It is definitely not the final situation, since the process terminates successfully in this case. On Jun 29, 2016 5:28 PM, "Nikolaj Bjorner" notifications@github.com wrote:

So your profile indicates that a lot of objects are left uncollected of the type: org.batfish.z3.node.VarIntExpr. These are of course batfish objects. The NodJob.java code does not return any such objects as far as I can see, but there are maybe some container objects that don't get collected sufficiently eagerly. Did you try to force a GC before returning from NodJob? Alternatively dispose of objects that possibly contain references to these objects.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Z3Prover/z3/issues/245#issuecomment-229528191, or mute the thread https://github.com/notifications/unsubscribe/AHYSEg29t1gCHlCKGAKOtD_dq6mKAb4mks5qQw2FgaJpZM4GOB_V .

cheshire commented 8 years ago

@arifogel So I assume we have achieved the goal of not crashing, as now 64GB is sufficient for your application?

As for the leak, I would do the following sanity check: after all context objects are closed and disposed of, and GC is called, [based on my understanding of Z3] there should be almost no memory occupied by native code. That should be relatively easy to check. If after all context objects are closed there is still a significant amount of memory used, I would try to either use Z3 diagnostic tools (I don't know whether it provides any functions to see where the memory is going?) or to generate a coredump and to manually inspect it.

arifogel commented 8 years ago

Based on the latest results I've provided, we see that the JRE does not allocate more than about 18GB at any given time (including both used and free heap), yet the total memory used by the java process running batfish now peaks at about 43GB. This means that we have 25GB of memory unaccounted for. If we are generous and say that 2GB of that 25GB is JRE process overhead, that still leaves 23GB of native memory that is unaccounted for. So @cheshire, can you guide me through creating and inspecting a coredump? I know how to do this when a z3 process is started separately, but I'm not sure what to do when it is called through JNI.

arifogel commented 8 years ago

As an aside, I realize that I brought up issues of space and performance in this thread. Let us just address the memory issue. When we are finished, if there is still a significant slowdown compared to running the same jobs partitioned into multiple executions, I will open a new bug.

cheshire commented 8 years ago

@arifogel Why are those 25GB unaccounted for? After all Z3 has to use some memory. That's why I've asked about closing all contexts first.

Creating a coredump is easy, just force Z3 to segfault. I think that creating a formula object, than closing the parent context, than trying to print that formula object (or any operations on a context in general) will do.

arifogel commented 8 years ago

Update: The reason why partitioning the jobs was improving performance was that I was accidentally sending an amount of information (in terms of rules) to each parallel Z3 job that incidentally scaled with the total number of jobs. After changing it so that only necessary information was provided to each job, the performance increased by about 1000x and memory usage is now insubstantial.

While I still believe there is a memory leak on the native side, I'm afraid I don't have a compelling test case anymore. For now I think we can close this. I'll let you know if I run into this bug again as I do different analyses.

wintersteiger commented 8 years ago

That's fantastic, thanks for reporting back @arifogel. I'll close this issues for now, but feel free to open a new issue or ask us to reopen this one if you run into new problems.

Z3Prover / z3

Addressing Context memory leakage in Java #245