Generate flamegraphs for other event types available in a JFR recording

leogomes commented 7 years ago

With this PR one is now also able to generate flamegraphs for the following event types:

Allocations in a new TLAB (allocation-tlab)
Allocation outside TLAB (allocation-outside-tlab)
Exceptions (exceptions)
Java Monitor Blocked (monitor-blocked)

In allocation-related flamegraphs the width of the flames represent the total amount of memory allocated in bytes;

For exceptions, the width represents the number of times exceptions were created;

For monitor blocked, the width represents the number of times a Java thread was blocked due that monitor being in use.

The create_flamegraph.sh script has been modified to add the -e option, which allows to specify the event type. For example,

./create_flamegraph.sh -e allocation-tlab -d -f my_recording.jfr > allocation_flames.svg

Would create a flamegraph for allocations in a new TLAB.

When the use doesn't specify an event type, method profiling sample is used by default. So, for example:

./create_flamegraph.sh -f my_recording.jfr > flames.svg

Would generate a flamegraph based on method profiling sample, as today.

chrishantha commented 7 years ago

Hi @leogomes, This is really great and thanks a lot for this PR. I was actually planning to work on allocation flame graphs and I was so happy to this PR.

leogomes commented 7 years ago

Awesome! Thanks for merging it :)

plokhotnyuk commented 7 years ago

It seems that allocation options represents just size that was allocated in events (num of events * average size) instead of estimated allocation size as it is reported by JMC. So it can be quite misleading when 2 threads (or pools) are used and they have different average size of allocations.

cykl commented 7 years ago

@plokhotnyuk Could you elaborate?

I just gave a try to allocation-tlab and am puzzled by its output. JMC reports 33GB of TLAB allocations, while the flamegraph only reports 20567136 samples (ie. sum of all EVENT_ALLOCATION_SIZE). Shouldn't sample count match "Total Memory Allocated for TLABs" in JMC ?

plokhotnyuk commented 7 years ago

As far I understand JFR for TLAB allocation tracks only events when the allocation occurs in new TLAB, so probability (and count) of these events depends on size of allocated block.

Please see on screenshot bellow that Est. TLAB allocation is not just Count * Average TLAB Allocation:

leogomes commented 7 years ago

Hello @cykl and @plokhotnyuk,

I think the best way to understand what the flamegraph is showing is to look at the Events tab. I will past here a screenshot of JMC 5.5. I'm still getting used to JMC 6 :)

For allocations in a new tlab, JFR records the size of the object that was the first allocation in a new TLAB and the size of that new TLAB. What you get is a sampling of TLAB allocations where you just recorded the first allocations. In my example above, I have a total of 86.27 GB allocated in new TLABs for my entire recording, but only ~180MB of samples, because only the first allocations were recorded. What I used in the flamegraphs was the size of that allocation. If the same stack trace appears again in another event, we aggregate the amount of memory allocated here. At the end, the width of a frame is the total amount of memory allocated by all the children of that frame (considering that we only look at samples, so take that with a grain of salt).

Now for allocations outside tlab, JFR records every allocation and the # of sample you see at the bottom of your flamegraph should match what you see here in "Total memory allocated for Objects":

Hope that makes sense. I will ping Marcus Hirt on twitter to see if he wants to have a look at this post :)

I was also thinking about contributing a different flamegraph were we would look at the number of allocations for a given code path, rather than the amount of memory. I'm just not sure how useful that would be. I'm usually looking at allocations to try to reduce the amount of memory (at least for TLAB allocations) more than their frequency. Allocations in a TLAB should be pretty cheap, since it's basically a pointer bump.

Let me know what you think.

cykl commented 7 years ago

Thanks for the explanation! I reached a similar conclusion by reading http://hirt.se/blog/?p=381 and playing with both JMC 5.5 & 6.0.

What you get is a sampling of TLAB allocations where you just recorded the first allocations

My, probably wrong, understanding of hotspot/src/share/vm/memory/threadLocalAllocBuffer.cpp is that TLABs are dynamically sized per thread. TLAB size is computed from a refill target, meaning that allocation heavy threads get larger TLABs. Wouldn't that mean that allocations from threads with small TLAB are over-represented and allocations from threads with large TLAB are under-represented? If so, how would you use a such flamegraph ?

leogomes commented 7 years ago

I think the article you cited kind of answers your question :)

Limitations

Since the Allocation in new TLAB events only represents a sample of the total thread local allocations, having only one event will say very little about the actual distribution. The more events, the more accurate the picture. Also, if the allocation behaviour vary wildly during the recording, it may be hard to establish a representative picture of the thread local allocations.

JFR is good because you only need a JDK to use it, but to have a better view of the allocations, you'd probably be better off using something like this: https://epickrram.blogspot.fr/2017/09/heap-allocation-flamegraphs.html

leogomes commented 7 years ago

One possibility to work-around this limitation is to use -XX:-UseTLAB and completely disable TLABs while you're looking at allocations. Of course, it will have an important impact on your application performance, but it seems to allow JFR to record all allocations. In any case, if you decide to instrument code to record all allocations, it will have a big hit on performance.

chrishantha / jfr-flame-graph

Generate flamegraphs for other event types available in a JFR recording #7