eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.26k stars 717 forks source link

Compiler Support for Debug On Restore #18866

Open dsouzai opened 6 months ago

dsouzai commented 6 months ago

This issue specifically tracks the Compiler support for Debug on Restore.

Parent issue: https://github.com/eclipse-openj9/openj9/issues/17642

Approach

See https://github.com/eclipse-openj9/openj9/issues/17642 for the discussion.

  1. Generate FSD Code pre-checkpoint
  2. Post-restore, compile non-FSD code and recompile all FSD bodies compiled pre-checkpoint

Implementation

In order to balance the constraints of startup/first response, footprint, and throughput, there are several additional pieces that need to be put in place.

  1. AOT validation must be delayed until post-restore. Otherwise, the validation framework will see that the compiler is running in FSD mode but the (already populated) SCC does not hold any FSD compatible code, leading to no AOT code getting loaded even post-restore.
  2. The IProfiler needs to be enabled during startup even if an SCC exists; this is because the SCC is not going to be used and therefore the heuristics of depending on the persistent profiling info available in the SCC is not applicable.
  3. Higher counts need to be used so that the IProfiler can collect sufficient data to preserve throughput. However, some of methods that would have been compiled at a low count are important for startup/first response. Thus, these important methods need to be proactively compiled in the checkpoint hook.
  4. Failed compilations pre-checkpoint should be retried post-restore; this is because the failure may be entirely due to the constraint of having to generate FSD code.
  5. Some IProfiler data pre-checkpoint may need to be dropped to reduce footprint; there may be more IProfiler data collected pre-checkpoint than is useful.

VM Coordination

Given the above compiler implementation details, there are three main pieces of VM coordination that has be implemented in order to ensure functional correctness

  1. If the proactive compilations in the checkpoint hook (point 3 above) are done without FSD (which is probably a good idea to reduce the amount of recompilation that will need to occur post restore), then if debug on restore is enabled post-retore, the VM will have to force a transition to the interpreter in the restore hook even if the debug events aren't enabled yet. Otherwise, application threads will start running non-FSD code which will not support a decompile when the events do get enabled. EDIT: This may not be needed (see https://github.com/eclipse-openj9/openj9/issues/18866#issuecomment-1974057119).
  2. If debug is not required, but normal redefinition is allowed (Fast HCR), then the VM needs to ensure that FSD bodies are decompiled when a redefinition event occurs, and not just calling jitConfig->jitClassesRedefined. This is because the FSD bodies do not have OSR guards (i.e. Voluntary OSR); they depend on the VM triggering the transition (i.e. Involuntary OSR). GAC is aware of this consideration in https://github.com/eclipse-openj9/openj9/issues/17642.
  3. Events J9HookDisable needs to be reset post-restore. The way the JIT checks if an event is enabled is by invoking J9HookDisable; however, this also has the side-effect of disabling the event if it has not already been enabled. This is to ensure that an agent later on cannot enable events that JIT generated code cannot support. Post-restore, the JIT does a similar set of calls to check if FSD is enabled, and so these events should be reset. https://github.com/eclipse-openj9/openj9/issues/16959 tracks this requirement.

PRs

dsouzai commented 6 months ago

fyi @tajila @vijaysun-omr

vijaysun-omr commented 6 months ago

"The IProfiler needs to be enabled during startup even if an SCC exists..." ... I assume you mean before checkpoint ? i.e. I assume we can use the SCC after restore and therefore won't want to pay the cost of IProfiler in the post-restore period until first response (or whatever stage in startup process iprofiling is off for) is done.

dsouzai commented 6 months ago

Yeah that's right, that's why part of the heuristic work is to "Turn off IProfiler post-restore so it can be naturally enabled later".

dsouzai commented 5 months ago

If the proactive compilations in the checkpoint hook (point 3 above) are done without FSD (which is probably a good idea to reduce the amount of recompilation that will need to occur post restore), then if debug on restore is enabled post-retore, the VM will have to force a transition to the interpreter in the restore hook even if the debug events aren't enabled yet. Otherwise, application threads will start running non-FSD code which will not support a decompile when the events do get enabled

I think we may be able to avoid having to worry about this. We can still do all the proactive compilation, but we don't have to update the extra field of the J9Method; instead we can just keep a list of these methods along with the new body. On restore, if Debug is not specified, we can update the extra field then; if it is, then we ignore the list. We could even queue them for code cache reclamation.

dsouzai commented 5 months ago

The previous comment has the caveat that the cost of delaying the updating of the extra field should not impact startup/first response; if doing so is too expensive then we will have to default to the original design.

dsouzai commented 5 months ago

After speaking with @mpirvu and @vijaysun-omr there is another option that really only became apparent after all of the work we did to get the compiler support to the point it is now.

The main idea is to run interpreted pre-checkpoint, but use all the heuristics described in this issue to ensure that the environment post-restore in terms of JIT'd code, IProfiler, SCC Validation, etc. are all the same. Furthermore, depending on the evaluation of https://github.com/eclipse-openj9/openj9/issues/18866#issuecomment-1974057119, it may even be possible to delay updating the extra field until post-restore.

The benefits of doing so are

  1. Low Footprint (since there won't be a need for any FSD code generation)
  2. Simpler VM coordination post-restore (since there won't be a mix of Voluntary and Involuntary code)
  3. No need for the VM to decompile in the restore hook

The drawbacks are

  1. Longer runtime to checkpoint
  2. Potential impact to startup/first response because of more interpreted methods on application thread stacks

Because of all the infrastructure that I already have in place in my branches, I don't think it should be all that much work to evaluate this idea, so I'll update once I have more information.

dsouzai commented 5 months ago

fyi @gacholio

dsouzai commented 5 months ago

Some observations from my first set of experiments, with the latest changes I have based on open PRs:

The current build involves generating FSD code pre-checkpoint, doing a set of (non-FSD) proactive compilation in the checkpoint hook, and then triggering recompilation of FSD methods post-restore. This seems to have minimal impact on startup and firstreponse (~1-2%) but a very large increase in footprint. Part of this is due to the increase in code cache and data cache, but also an increase in the SCC RSS.

I need to do a bit more investigation on this, but it's very likely due to the recompilations post-restore. Essentially, the process of triggering a recompilation will result in either the requesting thread or the comp thread touching the J9ROMClass at some point, which will bring the page back into the RSS. The reason I think this is likely is because when I moved the recompilations to occur pre-checkpoint, the RSS of the SCC went down. However, I haven't validated if this is consistent, so I don't want to confirm this just yet. Additionally, the code/data cache and internal memory is higher because now we have both FSD bodies and the recompiled non-FSD bodies.

I also tried running interpreted pre-checkpoint, but performing (non-FSD) compilations in the background, and then updating the relevant j9method->extra fields in the checkpoint hook. This does impact startup and footprint a bit more (4-5%), but the footprint is much lower than the previous builds; it is still higher than baseline, but a cursory investigation showed that the majority of it comes from the code/data cache and jit persistent memory. This is because we actually compile more methods than baseline; this is likely due to the fact that in baseline, just because many methods have their count set to 20 does not mean they actually decrement that count because they may have been inlined into other methods. In my prototype build however, in the pre-checkpoint hook, I compile everything that had their count set to 20; this is likely unnecessary.

Finally, it became apparent that delaying updating the extra field until restore is not feasible. jitMethodTranslated touches the J9ROMClass of the method, and so we will bring all of those pages back into the RSS[^1].


Given the following two observations:

Generating FSD code pre-checkpoint only to then also generate the recompiled non-FSD version pre-checkpoint seems wasteful; it increases the code/data cache and jit persistent memory usage. However, the benefit is that the pre-checkpoint run-time of the application is minimally impacted.

Only running interpreted pre-checkpoint prevents the complication of having both FSD and non-FSD code, and facilitates the means by which footprint should match baseline. However, it has the issue that the pre-checkpoint run-time is significantly impacted.

I should also note that I have yet to evaluate the throughput implications of all of this.

[^1]: Theoretically jitMethodTranslated could be modified to take in two extra parameters that answer the questions J9ROMMETHOD_HAS_VTABLE(J9_ROM_METHOD_FROM_RAM_METHOD(method)) and J9ROMCLASS_IS_INTERFACE(currentClass->romClass) to prevent the need to actually touch those pages.

vijaysun-omr commented 5 months ago

Thanks for the summary. A few factors that you bring up, but as a matter of opinion, I don't feel are very significant, i.e. we could choose to lower their importance to get to the final design point.

  1. The complexity of dealing with FSD and non-FSD code pre-checkpoint. In a sense, you have already dealt with this complexity, and so I don't know if this is a major issue from a development viewpoint (it could be from a maintenance viewpoint).
  2. The memory wastefulness of generating all the FSD and non-FSD code pre-checkpoint, since (correct me if I am wrong), you could essentially reclaim (in the code cache reclamation sense) all the FSD bodies that are not on some executing thread stack, right ? This, coupled with a plan to disclaim (from RSS) all the FSD code generated pre-checkpoint ought to reduce most of the waste associated with this scheme ?
  3. The startup time performance of the case when debug is enabled on the restore side, i.e. the cost of always decompiling in that case

I guess I am arguing in favor of having FSD and non-FSD bodies before checkpoint, rather than run interpreted until then.

gacholio commented 5 months ago

All FSD bodies can be reclaimed as we will be decompiling any that are on stack at restore. You could even do it aggressively on restore after the decompilations have been added to the appropriate stack frames.