Open tajila opened 2 years ago
The stack can be relocated using a modified stack grow. The real issue with copying a stack out of the 4Gb area is stack-allocated objects, specifically SA objects that point to other SA objects.
Consider the following:
class Pair {
Object a;
Object b;
}
Pair sa1 = new Pair(null, null);
Pair sa2 = new Pair("heap object", sa1);
sa1:
uint32_t Class
uint32_t null
uint32_t null
sa2:
uint32_t Class
uint32_t String on heap
uint32_t sa1
When the stack is relocated to a >4Gb location, the String reference remains unchanged, but the sa1 reference in sa2 needs to be relocated. Because the pointer is stored in a uint32_t, the relocation cannot be performed.
The obvious solution is to convert the reference to an offset (relative to the java stack) and convert it back to a pointer once the stack is copied back to a <4Gb location (which is almost certainly not going to be the same location it was copied from).
While the stack is unmounted (i.e. in the >4Gb location), the GC needs to update heap references such as the String above, but ignore the slots which have been converted to offsets. There's not metadata available to indicate which slots may point to SA objects, so the slots themselves will need to be tagged. All objects are at least uintptr_t aligned, so we have tag bits available in the offset case.
I believe stackAllocatedObjectSlotWalkFunction
needs to be modified to detect the tagged slots and not pass them to the GC.
@0xdaryl ^^
Added test excluded label due to https://github.com/eclipse-openj9/openj9/issues/16729#issuecomment-1476325957 and https://github.com/adoptium/aqa-tests/pull/4452
@fengxue-IS Is this issue and #15781 duplicate? If so, we should try to merge them. This issue has tests excluded against it; #15781 will need to close if they are merged.
Depends on https://github.com/eclipse-openj9/openj9/issues/15177
In the basic implementation all Continuation stacks are allocated in <4gb. This imposes a restriction on the number of active Continuations that we can have. The following solutions can address this.
Potential solution: When creating a continuation, nothing changes, simply allocate the <4gb stack.
Upon yield, allocate a new >4gb memory region that is equivalent to the size of the continuation stack (this can be a new allocation or from freelist). Copy the continuation stack to the newly allocated stack on a yield and free the <4gb stack (the original stack).
"copy" actually involves doing a stackwalk, similar to stack growing, to update all slots and account for stack allocated objects.
Upon entry to continuation get a new <4bg stack (this will be from free list) and copy the unmounted (>4gb) continuation stack contents to the new stack.
With this approach the GC still needs to know about Continuation objects, because it needs to walk their stacks. So essentially, from a GC perspective nothing changes from the basic implementation.
The benefits are that we the number of <4gb stacks for VirtualThreads is equivalent to the number of carrier threads *2 (one for a mounted Vthread, one for the carrier), so we can allocate a large number of VirtualThreads without issues.
The memory allocations on entry and yield can be avoided by keeping a freelist of stacks. In fact we can pre-allocate all the <4gb stacks up front since that is a fixed number.
We could potentially extend this approach to do a lazy copy when mounting a virtualthread, so not all frames are copied all at once, just the top most.