Open mpirvu opened 5 years ago
FYI @ashu-mehra
Analysis so far:
Running websphere-liberty docker image with -Xjit:verobse={compilePerformance}, I see lot of methods with bytecode size of 2 with memory consumption reported as mem=[region=704 system=16384]KB
.
Picked up one such method java/util/Collections$EmptyList.size()I
for further analysis. Its bytecodes are:
public int size();
Code:
0: iconst_0
1: ireturn
In the jit verbose logs the value reported as region
corresponds to the memory allocated by SystemSegmentProvider
. 704 KB
reported in jit verbose logs can be broken down as follows:
before comp/opt
- usage is 64 KB
comp/opt
comp/ilgen
- 448 KB
comp/ilgen
comp/opt
- 192 KB
Trying to get more granular breakdown of memory usage in comp/opt
comp/ilgen
phase using a debug build and forcing compilation of only EmptyList.size() using following option: -Xjit:limit={java/util/Collections*size*}(optLevel=cold),verbose={compilePerformance}'
.
So far I figured it allocates 6 TR::MemorySegments of 64 KB each.
What is comp/opt
? I am guessing it's the optimizer and then my next question is what is the proper ordering of the 3 snapshots above: before comp/opt
, comp/opt
and comp/ilgen
it's the optimizer
Yes
my next question is what is the proper ordering of the 3 snapshots above: before comp/opt, comp/opt and comp/ilgen
Order of these phases is as I listed above.
aah! there is a typo in that comment. comp/opt
should be comp/ilgen
. Let me correct it.
Summarising the issue here:
During compilation there are places where we end up using cs2 heap_allocator
which in turn uses TR::Region
as the base allocator. The heap_allocator
works by creating multiple list of segments where each list is used to satisfy request for memory within a range. Eg first list can satisfy memory requests upto 8 bytes. Next list for memory requests form 8 to 16 bytes and so on. More importantly each segment in the list is 64 KB in size. So even if there is only one memory request of 8 bytes, we end up allocating a 64 KB segment.
During compilation there are multiple places where we end up using heap_allocator
which results in allocating 4-5 segments of 64 KB and each of them is heavily under-used because of low allocation count.
As part of the fix I am trying to by-pass cs2 allocators and use TR::Region
directly to satisfy all allocation requirements during compilation.
Even the smallest cold compilation consumes more than 700KB of scratch memory and the smallest no-opt compilation consumes more than 500KB. We would like to understand if this amount of memory is warranted and reduce it if possible. Note: the amount of JIT scratch memory is reported by the verbose logs when using
-Xjit:verbose={compilePerformance}