eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Using full command line as GC hints key causes key proliferation #19308

Open gjdeval opened 6 months ago

gjdeval commented 6 months ago

The Java command line may have unique elements which are irrelevant to GC hints, causing multiple keys to be created when the GC hints are going to be the same. Using the entire command line can also make the keys very long and complex, which increases the work to search for a matching key when a new JVM instance is launched. This extra unnecessary work slows down JVM startup.

pshipton commented 6 months ago

@dmitripivkine @amicic @hangshao0

hangshao0 commented 6 months ago

There is an ongoing PR that avoid finding/storing GC hints in SCC in certain cases: https://github.com/eclipse-openj9/openj9/pull/19305.

amicic commented 6 months ago

There is a huge list of 'elements' that are irrelevant to GC hints. Probably the complement list is much shorter, mostly GC command line options (for example, tenure age threshold that may effect heap expansion dynamics). Either way it's not easy to identify it precisely...

As far as reducing find/store operations on a given JVM run, beside fully expanded heap scenario that we are addressing, we could perhaps avoid storing new hints if they are close to the existing value.

hangshao0 commented 6 months ago

The original discussions for the key of the GC hints are here https://github.com/eclipse-openj9/openj9/issues/3743. As there could be many options (combinations), the decision at that time was to use the whole command line.

dmitripivkine commented 6 months ago

This is an enhancement. The idea is to extract significant parameters from java command line and ignore insignificant for hint search. For example customer has unique ID identifier for each run. It prevents finding the hint using exact command line match. Also may be it makes sense to limit number of stored hints to prevent long search. I guess these improvements might help in general, not only for GC related items.

hangshao0 commented 6 months ago

It is also possible to store the GC hints in JVM exit phase rather than JVM startup, in this way users won't experience the possible performance impact during startup .

We have existing options like -Xscminaot<size> and -Xscmaxjitdata<size> to limit data stored by AOT and JIT to the shared cache. We could have a similar option for GC hints if it is not easy to identify the set of relevant options in the command line. In this case, we need to determine the default max number of GC hints allowed.

hangshao0 commented 6 months ago

Talked to @amicic @dmitripivkine, we still want to store the GC hints in the startup phase so that other JVMs startup together could benefit from the GC hints. We could add a limit for the number of the GC hints that can be added to the SCC, so that the SCC will not be gradually filled up with the hints.

hangshao0 commented 6 months ago

It is worth mentioning for this particular case, the unique ID of each run is in the java command line arguments, not the JVM command line options.

tajila commented 6 months ago

Here is my understanding, there are two issues:

1) GC may be query SCC for hints more than it needs to. This is being addressed in issues like https://github.com/eclipse-openj9/openj9/pull/19305. So I dont think we need to be concerned with that in this issue.

2) SCC uses the cmdline to track and associate gc hints with different applications. Using the entire cmdline is important because the behaviour of the application may change when a single parameter is change (JVM param or Application param).

For 2) Peter suggested adding a new capability (-Xshareclass:appConfig=[configName] or something like that). This option will notify SCC (for the purposes of GC hints) that all applications with the specified config can be treated as the same. So if a config is specified one does not need to store the cmdline.

What do you think of this approach @gjdeval ?

gjdeval commented 6 months ago

How would this new capability (-Xshareclass:appConfig=[configName] or something like that) be configured?

If this setting must be manually configured by the system operator, I wonder how useful it will be ... even with my long GC experience, I would not know how to decide which applications should be grouped together for GC hints identification.

pshipton commented 6 months ago

You don't need to group any applications together, you can use a separate config for each application. The VM can't separate applications from each other if the command line for an application is changing from run to run.

tajila commented 5 months ago

Here is the latest on this issue. 1) If a user specifies a non-default named SCC then we will interpret that as a user specifying a config (describe above) for an application. So we will not save cmdlines and assume that every invocation is part of the same config. We will also provide an option to disable this behaviour.

2) If a user uses the default SCC then we will have the same behaviour as we currently do. However, we will only store N cmdlines (we will provide an option to toggle N, we can use 16 as a default). When the JVM starts up, if the cmdline is not found in the SCC, that invocation will not use any GC hints.

Any questions on the feasibility of this approach @hangshao0 ?

hangshao0 commented 5 months ago

I agree that we should keep a maximum number of GC hints that can be stored into the SCC so that it will not gradually fill up the cache, like what is happening in this case.

One more thing I want to mention is that we are not doing a linear search to find the GC hints. It is a hashtable lookup where the command line string is the key. In this case as the java command line argument changes in every run, no existing GC hints will be found and one more new hint will be stored each run. It will create one more store contention if there are multiple JVMs starting up together. This perf impact can be measured comparing to a run using -XX:-UseGCStartupHints.

hangshao0 commented 5 months ago

For multiple JVMs starting up together sharing the same GC hint (under the same config), there could also be one more store contention if they go to update the hint.

dmitripivkine commented 1 month ago

Removing comp:c tag, an implementation is on VM side