eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

CRIU: Remove CRIU debug interpreter #19835

Closed tajila closed 3 weeks ago

tajila commented 3 months ago

The CRIU interpreter was added here, https://github.com/eclipse-openj9/openj9/pull/17245, to do 1) Enable support for hooks (needed for xtrace and xdump) 2) Allow one to avoid using the debug interpreter at starup

Now that we have the ability to transition to the debug interpreter on restore, we no longer need the criu interpreter. Instead we can start with the standard interpreter, and transition to the debug interpreter if needed.

Steps

  1. Add a new field in J9CRIUCheckpointState called debugInterpreterRequested
  2. The hooks for criuRestoreInitializeTrace and criuRestoreInitializeDump set the debugInterpreterRequested to true
  3. In checkTransitionToDebugInterpreter check if debugInterpreterRequested is set then attempt to transition to the debug interpreter.
  4. Add a new restore hook (similar to the ones above) and take the code in checkTransitionToDebugInterpreter that looks for the VMOPT_XXDEBUGINTERPRETER and move it to that function. It should set the debugInterpreterRequested if debug interpreter is there.
  5. Move checkTransitionToDebugInterpreter to after runInternalJVMRestoreHooks
singh264 commented 2 months ago

It seems like the next step is to complete the steps for removing the CRIU interpreter with the implementation that was started at https://github.com/singh264/openj9/commit/b35f2b735cba69123da2b23606b027fab8573a9d.

ThanHenderson commented 1 month ago

@tajila Just to clarify, in addition to the steps mentioned in the issue description, I am assuming the higher-level goal is to also remove all code associated with criuBytecodeLoop*.

ThanHenderson commented 1 month ago

We have a -XX:+DebugOnRestore option and an -XX:+DebugInterpreter option.

Currently, transitionToDebugInterpreter is guarded by isDebugOnRestoreEnabled(currentThread). With the new field (I've just added a flag to vm->checkpointState.flags instead), I've added a isTransitionToDebugInterpreterRequested(currentThread) that returns true if -XX:+DebugInterpreter had been set (or if -XTrace or -Xdump was specified).

My question is: what is the relationship now between -XX:+DebugOnRestore and -XX:+DebugInterpreter?

i.e. for the new guard condition, do we have an || relationship: isDebugOnRestoreEnabled(currentThread) || isTransitionToDebugInterpreterRequested(currentThread)

An && relationship: isDebugOnRestoreEnabled(currentThread) && isTransitionToDebugInterpreterRequested(currentThread)

Or no relationship: isTransitionToDebugInterpreterRequested(currentThread)

babsingh commented 1 month ago

@ThanHenderson For 0.48, this issue will need to be resolved by the end of this week. What's the current state of this issue? Based on this issue's impact, do we need it to be fixed in 0.48 or can it be pushed to 0.49?

ThanHenderson commented 1 month ago

Based on my current understanding of the issue, I have a patch ready. Just waiting on some input on my comments above (https://github.com/eclipse-openj9/openj9/issues/19835#issuecomment-2353341845 and https://github.com/eclipse-openj9/openj9/issues/19835#issuecomment-2353480945) to clarify some things.

tajila commented 1 month ago

My question is: what is the relationship now between -XX:+DebugOnRestore and -XX:+DebugInterpreter?

tajila commented 1 month ago

Or no relationship: isTransitionToDebugInterpreterRequested(currentThread)

EDIT: DebugOnRestore doesnt cause us to transition to debug interpreter, but it is a require capability that enables us to transition under the presence of jit frames.

tajila commented 1 month ago

Depending on the timing of when https://github.com/eclipse-openj9/openj9/pull/19754/ is merged either you or @JasonFengJ9 will need to request that checkpointState->debugInterpreterRequested is set if debug capabilites required the debug interpreter are requested on restore. Jason's PR will likely be merged first.

ThanHenderson commented 1 month ago

DebugOnRestore doesnt cause us to transition to debug interpreter, but it is a require capability that enables us to transition under the presence of jit frames.

Isn't that counter to the condition currently guarding the transition?: https://github.com/eclipse-openj9/openj9/blob/0985ff3f82bfdf1944c81f6242466fcbae2308b5/runtime/vm/CRIUHelpers.cpp#L1500-L1501

At this point, we have, effectively:

if -XX+DebugInterpreter && -XX+DebugOnRestore && (NULL == vm->jitConfig):
    transition()

There is a comment there that states that there is some missing JIT support which would change this to something like:

if -XX+DebugInterpreter && -XX+DebugOnRestore:
    transition()

But from your explanation, it sounds like what we'd want instead is:

if -XX+DebugInterpreter:
    if NULL != vm->jitConfig:
        if -XX+DebugOnRestore:
            transition()
    else:
        transition()
tajila commented 1 month ago

Yes, now I remember. The initial goal (in https://github.com/eclipse-openj9/openj9/issues/17642) was to have the ability to force a transition to the debug interpreter immediately. If the JIT is disabled then we can perform the transition instantly with transitionToDebugInterpreter, regardless of -XX+DebugOnRestore, this was true then and is true now. So the current code is incorrect.

DebugOnRestore enables the ability for the JIT to support debugging capabilities, which allows for quicker transitions to the interpreter when debugging agents are enabled. But in its current state, it doesn't let us transition unconditionally.

To achieve the initial goals (force immediate transition to the interpreter unconditionally) we need the ability to set an async event to transition to the interpreter (which is what transitionToDebugInterpreter does), but we also need the ability to force a decompilation of all java stacks unconditionally, which we dont have currently (debugOnRestore is limited).

Now back to this issue. The goal of this issue is to remove the criu interpreter so we only need to address the current cases where we rely on the criu interpreter and instead perform a transition to the debug interpreter. Previously the only case were we would need the transition was when -XX+DebugInterpreter was requested under -Xint. Now those cases also include, -Xtrace, -Xdump and with Jason's PR java debugging (-Xrunjdwp or -agentlib:jdwp).

So it makes sense to decouple the places that request a transition (-Xtrace, etc.) and the place we check and transition (checkTransitionToDebugInterpreter).

In the case of -Xtrace and -Xdump the JIT will just invalidate the code cache and generate new methods that support these capabilities so we dont need to worry about what JIT frames will do, we just need to make sure the interpreter is in the correct mode.

In the case of java debugging (-Xrunjdwp...) its not enough to simply transition to the debug interpreter, we need the JVM to be in debugonrestore mode. However, this check can be done by the hook that will detect JDWP instead of checking in checkTransitionToDebugInterpreter.

In the case -XX+DebugInterpreter, we can transition instantly under -Xint, however, if there are JIT'ed frames the transition is delayed or may never occur. Given that this is for diagnostic purpose Im okay with attempting the transition of the interpreter side anyways (we can improve on this in the future once we have the ability to decompile all java stacks). Note: this will be a change in behaviour.

Given the above, what this means is that checkTransitionToDebugInterpreter should check if transition to the debug interpreter is requested. If it is, it means the due diligence was done by however requested it. So something like:

if (vm->checkpointState.debugInterpreterRequested) { 
    //Transition can be requested by -XX+DebugInterpreter, -Xrunjdwp, -Xtrace, -Xdump, etc.
    transition();
}
ThanHenderson commented 1 month ago

So the current code is incorrect

That is what I was thinking.

Thanks for the detailed explanation. It clarifies a lot.

In the case of java debugging (-Xrunjdwp...) its not enough to simply transition to the debug interpreter, we need the JVM to be in debugonrestore mode. However, this check can be done by the hook that will detect JDWP instead of checking in checkTransitionToDebugInterpreter.

This makes a lot of sense.

Im okay with attempting the transition of the interpreter side anyways

Sounds good.

ThanHenderson commented 3 weeks ago

Closing via https://github.com/eclipse-openj9/openj9/pull/20177

github-actions[bot] commented 3 weeks ago

Issue Number: 19835 Status: Closed Actual Components: comp:vm, criu Actual Assignees: No one :( PR Assignees: ThanHenderson

pshipton commented 3 weeks ago

This is in the 0.48 milestone, are we planning to double deliver the update there?

ThanHenderson commented 3 weeks ago

@pshipton I was going to open a PR for 0.48 shortly for this. It isn't pressing that it gets in, but if we are still delaying, we can deliver there too.

ThanHenderson commented 3 weeks ago

https://github.com/eclipse-openj9/openj9/pull/20191 Needs to merge into 0.48 first since my commit depends on the API changes there.

pshipton commented 3 weeks ago

We'll just keep this open until the PR for 0.48 is either delivered, or we decide not to.

ThanHenderson commented 3 weeks ago

PR for 0.48 port: https://github.com/eclipse-openj9/openj9/pull/20275

github-actions[bot] commented 3 weeks ago

Issue Number: 19835 Status: Closed Actual Components: comp:vm, criu Actual Assignees: No one :( PR Assignees: ThanHenderson