Loom: Move unmounted continuation stacks out of the low memory area

eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.

Other

3.28k stars 721 forks source link

Loom: Move unmounted continuation stacks out of the low memory area #15781

Open gacholio opened 2 years ago

gacholio commented 2 years ago

If running compressed references, when a continuation stack is unmounted, it should be copied into new memory not allocated in the <4Gb area. When remounted, copy the stack back to the low memory area.

The complication is stack-allocated objects which point to other stack-allocated objects. Because the reference slots in the objects are only 32 bits wide, they can't simply be relocated.

gacholio commented 2 years ago

The basic relocation will be performed by generalizing the code which handles stack growing to separate the allocation and relocation of the stack.

Stack-allocated objects are walked by the stack walker using JIT metadata which details where each SA object beings in the stack frame. The walker uses the GC object iterator to then walk the individual slots of the SA object. None of the stack slots of SA objects are marked as an object slot for the walker.

For non-compressed refs, the existing relocation code will work unmodified.

For compressed refs, object slots in SA objects are only 32 bits wide, so they cannot simply be relocated. With compressed refs shift, every bit in the slot may be used, so a slot tagging solution is not appropriate.

gacholio commented 2 years ago

During the stack relocation, I suggest we convert the compressed object pointer slots to uncompressed stack offsets (which will always fit in 32 bits). walkContinuationStackFrames will need to pass a new flag into the stack walker instructing it that the SA slots are offsets.

gacholio commented 2 years ago

There is an issue with the stack offsets - how to distinguish between slots which contain offsets (used to point to an SA object) and slots which contain heap object pointers. With no tag bits available, we may need to keep a separate bitmap to indicate which slots contain offsets.

gacholio commented 2 years ago

The bitmap will be placed after the end of the copied stack (in the same allocation). As an optimization, the copied stack and bitmap should exclude the unused portion of the stack.

gacholio commented 2 years ago

Continuing on with the code, I notice the stack grower relocates arraylet leaves found in stack-allocated objects. I'm not sure yet that the offset strategy will work with these.

@0xdaryl Are there ever in fact stack-allocated arrays in arraylet GC policies?

0xdaryl commented 2 years ago

Are there ever in fact stack-allocated arrays in arraylet GC policies?

There might be. It isn't clear to me from a simple inspection of the code. x86, for example, sets a codegen flag [1] to permit stack allocation of arraylets and this is checked in the EA candidate-finding loop so there may be cases where contiguous arrays are stack allocated. Large, discontiguous arraylets are never stack allocated, however.

@hzongaro : can you fill in any more details here?

[1] https://github.com/eclipse-openj9/openj9/blob/b4bf9a8f4c95aec9c79e65ea896e409afa8a8c84/runtime/compiler/x/codegen/J9CodeGenerator.cpp#L125

gacholio commented 2 years ago

So-called contiguous arrays do not contain a spine (where the arraylet pointers are). You say above that large discontiguous arrays aren't stack allocated - what is considered large? The easiest thing to do here would be to simply disallow stack allocation of arrays that require a spine, regardless of the total size.

My concern is that j9mm_iterate_object_slots (used to walk the slots of stack-allocated objects) will not be able to walk the spine of an array that has been relocated to the high memory area.

hzongaro commented 2 years ago

Yes, Escape Analysis will stack allocate contiguous arrays under the GC policies that allow for arraylets. It relies on J9::Compilation::canAllocateInline to make that determination. That method imposes a limit on the number of elements of 0xFFFFF, but it also checks whether the total size of the array would result in its being discontiguous.

gacholio commented 2 years ago

Thanks - this means I can remove the walking of the spine completely from the stack grow code since it can never occur and not worry about it for this design.

gacholio commented 2 years ago

@fengxue-IS Here's the first cut of the code: https://github.com/gacholio/openj9/tree/loom

A few things to note:

In JDK19 you've added a field to the stack header. You could change this to a bit field and store a flag in there indicating that the stack is for an unmounted continutation. This would simplify some of the code paths.
Stacks allocated for unmounted continuations must currently be freed using the port library directly. If the flag above is added, this would no longer be true.
I'm zeroing the stack for unmounted when it's allocated - really only the trailing bitfield needs to be zeroed.
Added a new flag for the stack walker which would also not be needed if the stack is tagged with the new flag.