Open dsouzai opened 1 year ago
@vijaysun-omr @ymanton @mpirvu @tajila
You may want to check with @mpirvu or @klangman that the following is okay with them (maybe the answer is different for the initial GA related code completion date in a month vs longer term than that).
jit / -Xaot count options | Only apply for new loaded methods; applying this for existing interpreted J9Methods from the checkpoint is a nice to have for GA. -- | --For disableAsyncCompilation
are you planning to solve using 1 or 6 ? The table lists it could be either.
Have you finalized what will be done for -Xshareclasses
? My question is first to see if we can write down what exactly we are planning to do/support clearly, rather than necessarily push for any change in that plan for the initial GA.
@tajila what is the -Xshareclasses
plan in terms of specifying it post-restore, at least for 0.38? Are we going to support modifying it it any way (e.g., allowing a user to specify -Xshareclasses:none
post restore)?
Similar question for -Xint
; I think if a user specifies -Xint
pre-checkpoint, then it's probably reasonable to not bother trying to configure the JIT post restore, but what about the situation where a user specifies -Xint
post-restore?
For disableAsyncCompilation are you planning to solve using 1 or 6 ? The table lists it could be either.
Spoke to @vijaysun-omr offline, but for posterity:
The issue comes from the fact that the preprologue shape is different based on sync or async compilation. A lot of code just checks the global state for the type of compilation that generated the body. If post restore, the mode changes, then in order to not have crashes, we at the very least need to have some information per body to inform the patching code.
It may be possible to have the shape of the preprologue always be prepared to switch modes, but this is not something we can achieve for 0.38, so for now we're just going to ignore the option.
Similar question for -Xint; I think if a user specifies -Xint pre-checkpoint, then it's probably reasonable to not bother trying to configure the JIT post restore, but what about the situation where a user specifies -Xint post-restore?
Im not sure we can fully support this in time for GA, but its certainly doable. One approach is to add an option -XintOnRestore
that essentially prevents any further JIT transistions, but this would simply be a stop gap until -Xint can be supported. So its not a usefull feature longterm.
@tajila what is the -Xshareclasses plan in terms of specifying it post-restore, at least for 0.38? Are we going to support modifying it it any way (e.g., allowing a user to specify -Xshareclasses:none post restore)?
I feel like this is really an AOT question. From a VM perspective, what comes out of SCC will be identical to loading from jar. so long as things are in sync, which we have checks for.
Im not sure we can fully support this in time for GA, but its certainly doable. One approach is to add an option -XintOnRestore that essentially prevents any further JIT transistions, but this would simply be a stop gap until -Xint can be supported. So its not a usefull feature longterm.
Yeah I agree. So I suppose for 0.38 then, if someone specifies -Xint
post restore, it just gets ignored
From a VM perspective, what comes out of SCC will be identical to loading from jar. so long as things are in sync, which we have checks for.
Well I suppose what we need to do for AOT depends on what is allowed to change. For example, if the user changes the SCC, I will need to double check but I don't think it affects how compilation occurs because we always access the SCC via the SCC APIs. However, if the user specifies -Xshareclasses:none
then we may need to reflect that so that we don't try to use the SCC APIs.
Using -Xint
and -Xshareclasses:none
after restore both fall in the category of someone wanting to work around a problem (with the JIT or SCC respectively). In theory we could get them close to what they would have gotten with -XintOnRestore
to prevent interpreter to JIT transitions post restore with -Xjit:exclude={*}
and that would mean we don't JIT anything after restore at least. While this exclude option won't prevent interpreter to JIT transitions post restore, at the end of the day -XintOnRestore
also won't do a "perfect" job of never running JITed code after restore. For that we will do the "proper" -Xint
support after the initial release, including OSR out of JITed code and preventing future transitions to JITed code. Maybe this is the long form argument for "doing nothing" about -Xint
in this release, do you agree Irwin and Tobi ? I'll add a different comment about -Xshareclasses:none
shortly.
Maybe this is the long form argument for "doing nothing" about -Xint in this release, do you agree Irwin and Tobi ?
Yes, given that -Xint
has very specific behaviour that we can't achieve post-restore in 0.38, and given that there are ways to minimize the time spent in JIT'd code with things like -Xnojit -Xnoaot
(yet to be implemented at the time of this comment) or -Xjit:exclude={*} -Xaot:loadExclude={*}
(~implemented at the time of this comment~ not working but I have a fix), having another option like -XintOnRestore
as a stopgap would essentially just be an alias of these existing options, and therefore would be unnecessary work both wrt code and documentation.
-Xshareclasses:none
after restore may be useful in case there is some customer issue with SCC (as has happened anecdotally and this option was provided as a workaround). In terms of loading classes after the restore from jar file as opposed to SCC, this is still a change in logic that would need to be implemented, right ? i.e. someone would need to check after restore that we have switched the SCC off in the VM and decide to go to the jar file for future loads instead. So, in that sense, it isn't zero work ?
From an AOT perspective, I am thinking we will just support this to the extent of a) not loading AOT code and b) not doing any AOT compilations after the restore in 0.38. I believe we won't use SCC apis to do any compiles, unless an AOT compilation has been queued (which is what we need to prevent from happening at the control layer). Again, this is not zero work but maybe this is what you are thinking of supporting, Irwin.
Apart from -Xshareclasses:none
there is the question of other -Xshareclasses
options. I assume we don't plan on supporting any of those other options for 0.38, but in theory, a) changing the SCC location and b) changing the SCC and/or AOT code size could be the options I can see some questions about in the future. It would be good to get a brief description of our long term position on this, but confirming what we are going to support in 0.38 is the primary goal that I am after.
Maybe this is the long form argument for "doing nothing" about -Xint in this release, do you agree Irwin and Tobi ?
I agree, I think we can leave out -Xint for the initial release.
-Xshareclasses:none after restore may be useful in case there is some customer issue with SCC (as has happened anecdotally and this option was provided as a workaround). In terms of loading classes after the restore from jar file as opposed to SCC, this is still a change in logic that would need to be implemented, right ? i.e. someone would need to check after restore that we have switched the SCC off in the VM and decide to go to the jar file for future loads instead. So, in that sense, it isn't zero work ?
There are two ways of approaching this. The brute force way is to NULL the javaVM->sharedClassConfig
, this will result in no ROM classes, AOT, SCC related MXBEAN metrics and SCC API functioning after restore. Its probably as close to -Xshareclasses:none
as we will get on restore. The downside is that we will need to update the JVM in all the places that may expect the config to exist if it did on startup. Also if a user had frames on the stack that were using the SCC API, there may be unexpected behaviour.
The second approach is to just not return anything from the SCC. So on restore, during classload the VM would just not query the SCC, the JIT would have to do the same when looking for AOT code. SCC metrics and SCC API would continue to work as normal though. I would argue this is more like -Xshareclasses:disableLoadingOnRestore
as it doesn't quite behave the same as -Xshareclasses:none
The second approach seems more feasible, and something we can likely get done within the 0.38 timeframe.
I assume we don't plan on supporting any of those other options for 0.38, but in theory, a) changing the SCC location and b) changing the SCC and/or AOT code size could be the options I can see some questions about in the future. It would be good to get a brief description of our long term position on this, but confirming what we are going to support in 0.38 is the primary goal that I am after.
I think it is possible support some version of -Xshareclasses:none
for 0.38.
a) changing the SCC location
After speaking with @hangshao0 he feels this wont be feasible on restore
b) changing the SCC and/or AOT code size could be the options
This seems possible, but not a small task
Thanks Tobi, I agree with the "second approach" that you mentioned for -Xshareclasses:none
for this (0.38) release since it does feel cleaner.
The second approach is to just not return anything from the SCC. So on restore, during classload the VM would just not query the SCC, the JIT would have to do the same when looking for AOT code. SCC metrics and SCC API would continue to work as normal though. I would argue this is more like -Xshareclasses:disableLoadingOnRestore as it doesn't quite behave the same as -Xshareclasses:none
I assume some new -XX option(s) need to be added to tell the JVM not to load things like classes/AOT from SCC post restore ? I believe we will disable storing these new data into SCC as well.
@ehrenjulzert, can you look at implementing the second approach mentioned here https://github.com/eclipse-openj9/openj9/issues/16714#issuecomment-1452248303 ?
I assume some new -XX option(s) need to be added to tell the JVM not to load things like classes/AOT from SCC post restore ? I believe we will disable storing these new data into SCC as well.
Im in favour of using a new name, since this behaviour is not identical to -Xshareclasses:none
thoughts @vijaysun-omr @dsouzai
We have a mechanism for supplying options on restore, so its just a matter of doing something like FIND_ARG_IN_ARGS(vm->checkpointState.restoreArgsList, ...)
to find the option restore.
I'm ok with using a new name as I've already set up some infra to get the post-restore options, i.e.: https://github.com/eclipse-openj9/openj9/blob/58fb3c25e7c02616411b2bb6e9d18be2af3b519a/runtime/compiler/control/OptionsPostRestore.cpp#L43-L44
That said, in normal startup we don't check the options directly, but rather: https://github.com/eclipse-openj9/openj9/blob/58fb3c25e7c02616411b2bb6e9d18be2af3b519a/runtime/compiler/control/DLLMain.cpp#L504-L525
Is that something that we can do again, or is it that because of the J9Hook mechanism it is possible that the J9HOOK_VM_PREPARING_FOR_RESTORE
JIT hook gets called before the SCC hook?
Also, to @hangshao0's point, will there also be a -Xshareclasses:disableStoringOnRestore
or do we just want to support something like -Xshareclasses:disableOnRestore
?
I prefer a single option that disables both SCC finding/storing on restore, which is closer to the behaviour of -Xshareclasses:none
.
Also, to @hangshao0's point, will there also be a -Xshareclasses:disableStoringOnRestore or do we just want to support something like -Xshareclasses:disableOnRestore?
I prefer -Xshareclasses:disableOnRestore
which does everything.
What is the expected behaviour of using the new option where CRIU is not supported. If we silently ignore it, then it looks like it should be a -XX option (there are existing SCC -XX options like -XX:+PortableSharedCache
, -XX:SharedCacheHardLimit=
, etc). Currently for all the -Xshareclasses:
sub-options, we exit the JVM with an error message if an unsupported option is found.
Also we don't consume multiple -Xshareclasses:
options. The last one wins. All previous ones are ignored. The only exception is -Xshareclases:none
, which is never ignored.
Is that something that we can do again, or is it that because of the J9Hook mechanism it is possible that the J9HOOK_VM_PREPARING_FOR_RESTORE JIT hook gets called before the SCC hook?
The struct sharedClassConfig
is prepared and set to vm->sharedClassConfig
during phase: JIT_INITIALIZED
. After this phase, all the fields and exposed APIs in sharedClassConfig
can be checked and used by other components. After the new option to disable SCC is parsed, we will turn off the exposed APIs (related to finding/storing operations) to return directly. vm->sharedClassConfig
won't be set to NULL
and J9SHR_RUNTIMEFLAG_CACHE_INITIALIZATION_COMPLETE
won't be removed, But before restore options are consumed, I am not sure if there could be a timing window these APIs can be used. I am not familiar with the restore process.
The VM will parse the new option. We can expose a flag/API about whether the new option is found or not to JIT, if JIT does not parse the option itself.
The VM will parse the new option. We can expose a flag/API about whether the new option is found or not to JIT, if JIT does not parse the option itself.
Yeah I guess it depends on the order of operations during restore; if the JIT hook gets invoked after the flag has been set then we don't have to parse it, otherwise we will. I'm ok either way, but it's worth knowing for sure.
Also on a related note, I don't think (for 0.38 anyway) that I can guarantee that some SCC API won't get invoked by a non-compilation compiler thread. @hangshao0, what are the consequences of something like that? For example, if -Xshareclasses:disableOnRestore
is specified, post restore but some thread calls vm->sharedClassConfig->findSharedData
.
what are the consequences of something like that? For example, if -Xshareclasses:disableOnRestore is specified, post restore but some thread calls vm->sharedClassConfig->findShared
It will behave as if the data is not in the SCC. findShared()
returns 0 (0 data element found in the SCC) and nothing will be returned via J9SharedDataDescriptor
.
Yeah I guess it depends on the order of operations during restore; if the JIT hook gets invoked after the flag has been set then we don't have to parse it, otherwise we will. I'm ok either way, but it's worth knowing for sure.
Talked to @JasonFengJ9, the restore options is parsed here: https://github.com/eclipse-openj9/openj9/blob/5a4ab53779ef5aa6e623d0ef4f34bcec4530683e/runtime/criusupport/criusupport.cpp#L696-L707
which happens before jvmRestoreHooks()
that triggers J9HOOK_VM_PREPARING_FOR_RESTORE
Also we don't consume multiple -Xshareclasses: options. The last one wins. All previous ones are ignored. The only exception is -Xshareclases:none, which is never ignored.
if -Xshareclasses:disableOnRestore
is specified on startup, the JVM should fail to start with:
JVMJ9VM007E Command-line option unrecognised: -Xshareclasses:disableOnRestore
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
This option should be limited to restore
@hangshao0 To check if the SCC is disabled post-restore, do I just check if J9SHR_RUNTIMEFLAG_CACHE_INITIALIZATION_COMPLETE
is not set in vm->sharedClassConfig->runtimeFlags
? Or do you recommend I just search for the -Xshareclasses:disableOnRestore
option?
do I just check if J9SHR_RUNTIMEFLAG_CACHE_INITIALIZATION_COMPLETE is not set in vm->sharedClassConfig->runtimeFlags? Or do you recommend I just search for the -Xshareclasses:disableOnRestore option?
J9SHR_RUNTIMEFLAG_CACHE_INITIALIZATION_COMPLETE
will always be set, so checking this flag won't work. @ehrenjulzert is adding an variable to indicate if -Xshareclasses:disableOnRestore
presents, you can check that.
Hm, well because we need to get the code in by Friday and the JIT changes will have to wait until the SCC changes get merged in, I'll just explicitly search the option for now, and add an item in https://github.com/eclipse-openj9/openj9/issues/16853 to use the variable you're mentioning.
@dsouzai : this is targeted to 0.40. Is there outstanding work to be delivered in the next few days or should this move out?
No we should move this out; nothing on this front is going to get delivered to 0.40.
OK, moving to 0.41.
Service Requirements
-Xtrace:trigger=tpnid{j9criu.7-8,javadump}
Policies
There are several ways in which options provided post restore can be implemented. The rest of this section goes over the various policies that can be associated with the options. Note, there’s no reason that only one of these policies needs to hold for all options; different options can have different policies.
1. Options only apply from the restore point onward.
The options do not impact anything that was compiled before the restore; the options only affect the environment/compiles post restore.
2. Options apply in a manner similar to AOT loads to the extent possible
For the most part options do not impact anything that was compiled before restore; however, there are ways to control execution of code that was compiled pre checkpoint, though it isn’t guaranteed that the code can never execute.
3. Run the checkpoint as if the option was set
In the restore if it’s not set, then all future compilations do not need to be restricted; if it is set then existing compiled code does not need to be invalidated. Depending on the option, this policy can either mean the option is in effect, or that the option isn’t in effect but initialization contingent on the option occurs.
4. If an option cannot be applied retroactively, fail the checkpoint and start a new JVM in default mode
To completely guarantee that all options are applied post restore, abandon the restore and start the JVM in normal mode.
5. If an option cannot be applied retroactively, OSR out and recompile
To completely guarantee that all options are applied post restore, OSR out of code that is invalidated by the (post restore) options.
6. Option does not apply
The option does not apply in the restore run. The option can be ignored or the restore can fail.
Post Restore Options Semantics
-Xaot
-Xnoaot
is specified pre-checkpoint,-Xaot
is ignored post-restore. For -Xint, see -Xjit below.-Xnoaot
-Xjit
-Xnojit
-XCompilationThreads
-XlockReservation
-Xquickstart
-XsamplingExpirationTime
-Xlp
-Xlp:codecache
-XtlhPrefetch
-Xcodecache
-Xcodecachetotal
-XX:codecachetotal
-XX:[+\|-]MergeCompilerOptions
-XX:[+\|-]RuntimeInstrumentation
-XX:[+\|-]UseJITServer
-XX:JITServerAddress
-XX:[+\|-]JITServerLocalSyncCompiles
-XX:[+\|-]JITServerLogConnections
-XX:JITServerPort
-XX:JITServerSSLRootCerts
-XX:JITServerTimeout
-Xjit / -Xaot count options
-Xjit
/-Xaot
exclude & limit options-Xjit:verbose= / -Xjit:vlog= / -Xjit:rtLog=
-Xjit:disableAsyncCompilation
-Xjit
/-Xaot
-Xrs
&-Xtrace
-Xaggressive
-Xtune:virtualized
-Xshareclasses
-Xjit:enableGPU
-XX:[+\|-]PerfTool / -Xjit:perfTool
-Xjit:optLevel=
optLevel=
, inhibit recomp, etc.Related Considerations
OMR::Options::_logFile
, verbose log, rtLog, etc)TR::Compiler->target.numberOfProcessors
Implementation
Open Questions
-Xint
: The only ways to handle -Xint post restore completely are with policies 4 or 5. The reason is because policy 2 does not prevent execution of compiled methods on the stack. Pre checkpoint, -Xint should probably still result in the JIT enabled to ensure that the JIT can be enabled post restore.-Xshareclasses
: What we need to do depends on the extent to which we want to support changes to the -Xshareclasses option.-Xlockword
: If the VM supports -Xlockword changes then the only ways to handle it are with policies 4 or 5 (for the same reason as -Xint).Notes
Relevant PRs
OMR
OpenJ9