Excessive JIT Scratch memory for cheap cold/noopt compilations

mpirvu commented 5 years ago

Even the smallest cold compilation consumes more than 700KB of scratch memory and the smallest no-opt compilation consumes more than 500KB. We would like to understand if this amount of memory is warranted and reduce it if possible. Note: the amount of JIT scratch memory is reported by the verbose logs when using -Xjit:verbose={compilePerformance}

mpirvu commented 5 years ago

FYI @ashu-mehra

ashu-mehra commented 5 years ago

Analysis so far:

Running websphere-liberty docker image with -Xjit:verobse={compilePerformance}, I see lot of methods with bytecode size of 2 with memory consumption reported as mem=[region=704 system=16384]KB. Picked up one such method java/util/Collections$EmptyList.size()I for further analysis. Its bytecodes are:

  public int size();
    Code:
       0: iconst_0
       1: ireturn

In the jit verbose logs the value reported as region corresponds to the memory allocated by SystemSegmentProvider. 704 KB reported in jit verbose logs can be broken down as follows:

before comp/opt - usage is 64 KB ~~comp/opt~~ comp/ilgen - 448 KB ~~comp/ilgen~~ comp/opt - 192 KB

Trying to get more granular breakdown of memory usage in ~~comp/opt~~ comp/ilgen phase using a debug build and forcing compilation of only EmptyList.size() using following option: -Xjit:limit={java/util/Collections*size*}(optLevel=cold),verbose={compilePerformance}'. So far I figured it allocates 6 TR::MemorySegments of 64 KB each.

mpirvu commented 5 years ago

What is comp/opt? I am guessing it's the optimizer and then my next question is what is the proper ordering of the 3 snapshots above: before comp/opt, comp/opt and comp/ilgen

ashu-mehra commented 5 years ago

it's the optimizer

Yes

my next question is what is the proper ordering of the 3 snapshots above: before comp/opt, comp/opt and comp/ilgen

Order of these phases is as I listed above.

ashu-mehra commented 5 years ago

aah! there is a typo in that comment. comp/opt should be comp/ilgen. Let me correct it.

ashu-mehra commented 5 years ago

Summarising the issue here:

During compilation there are places where we end up using cs2 heap_allocator which in turn uses TR::Region as the base allocator. The heap_allocator works by creating multiple list of segments where each list is used to satisfy request for memory within a range. Eg first list can satisfy memory requests upto 8 bytes. Next list for memory requests form 8 to 16 bytes and so on. More importantly each segment in the list is 64 KB in size. So even if there is only one memory request of 8 bytes, we end up allocating a 64 KB segment. During compilation there are multiple places where we end up using heap_allocator which results in allocating 4-5 segments of 64 KB and each of them is heavily under-used because of low allocation count.

As part of the fix I am trying to by-pass cs2 allocators and use TR::Region directly to satisfy all allocation requirements during compilation.

eclipse-openj9 / openj9

Excessive JIT Scratch memory for cheap cold/noopt compilations #7543