Open paulcheeseman opened 3 years ago
@keithc-ca
I see five instances of
JVMDUMP013I Processed dump event "systhrow", detail "java/lang/OutOfMemoryError".
in the console output in the description which suggests there are 5 (or more) threads involved. The results are consistent with each thread independently processing dumps in (dump) priority order:
range=1..1
). range=1..4
).Dumps can be configured for a larger number of java and snap dumps, say range=1..10
, which would allow for up to ten threads encountering OutOfMemoryError (nearly) simultaneously without loss of trace buffers.
One might argue that dump priorities should be global instead of per thread, however, that will require some thought about whether we can avoid deadlock in all cases.
@keithc-ca
Do you have any thoughts about the suggestion of not flushing the trace buffers when taking a snap dump?
From my (perhaps limited) understanding of how tracing is managed, the per-thread trace buffers need to be merged before the information can be written to a file. That 'merging' is like 'flushing'. It's not clear there's an alternative that wouldn't have significant performance implications.
When a snap dump is generated, the trace buffers are flushed. This appears to be intentional, but there is an undesirable side effect that occurs under certain cirumstances.
When a multithreaded process runs out of Java heap space, several threads can throw OutOfMemoryErrors at almost exactly the same time, and dumps are simultaneously triggered for all these threads. The default Xdump agents for OOMs are configured with priority values that should lead to dumps being triggered in this order:
system dump -> heapdump -> javacore -> snap dump
However, this ordering isn't always respected, presumably because the default agents are not all specified with the
serial
option. For example, here's some stderr output from a real multithreaded app hitting an OOM on the IBM JRE:As you can see, in this case the dumps were produced in this order:
system dump -> heapdump -> heapdump -> heapdump -> heapdump -> snap dump -> javacore -> snap dump -> javacore -> snap dump -> javacore -> snap dump -> javacore
The problem here is that the first snap dump happens before the first javacore. This means that the "GC History" and "Current thread history" sections in the javacore are empty, because the trace buffers have been flushed before the javacore is generated:
This is not a desirable result. Yes, the snap dump should contain the missing trace data, but I suspect that most users and support teams are much more accustomed to viewing javacore files than processing snap dumps.
I'm not sure why the trace buffers are flushed when snap dumps are generated (or what downsides are associated with not flushing them), but perhaps the current behaviour needs to be reconsidered?