eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 721 forks source link

Define Portable SCC #7710

Open dsouzai opened 4 years ago

dsouzai commented 4 years ago

For the majority of the time the SCC feature has existed, there was no real reason for the SCC to ever be moved from one machine to another. However, with the emergence of Docker containers it is much more likely that a SCC generated on one machine could be used on another. In fact, multi-layered SCCs (introduced in 0.17) facilitates such a workflow. However, because the SCC was never moved from one machine to another, it is not completely portable1. This high level issue exists to track the discussion and follow up work to define and implement a completely portable SCC.

The following is a list of ideas that worth discussing in the OpenJ9 Community Call; feel free to comment other ideas and I'll add it to the list.

  1. Define baseline target processor features / use target processor features specified in SCC (https://github.com/eclipse/openj9/issues/7635).
  2. Make Target Environment local to compilation (https://github.com/eclipse/omr/issues/4518).
  3. Add option to -Xshareclasses to denote "portable" mode.
  4. Save options string to SCC, have ability to set up JVM according to stored options in SCC.
  5. Force compressed shift amount in "portable" mode.
  6. Standardize write barrier (write barrier sequence can vary even in the same GC policy depending on the heap (e.g. fixed heap))

Sub Issues (Discussion):

Overall Principles:


1 - The lack of portability comes mostly from the stored AOT code; however, this issue tags other components because defining a portable SCC still requires coordination with the other components.

hangshao0 commented 4 years ago

I guess the Portable SCC applies to persistent cache only.

dsouzai commented 4 years ago

I guess the Portable SCC applies to persistent cache only.

Is there a non-persistent SCC?

hangshao0 commented 4 years ago

Is there a non-persistent SCC ?

Yes. But the default cache type is persistent on all platforms except z/OS.

dsouzai commented 4 years ago

Ah right; then yeah, this is only relevant for the persistent cache. More specifically, only if the user wishes to make it portable. IMO I don't think we should enforce that the SCC is always portable since portability limits the amount of optimization that can be done.

DanHeidinga commented 4 years ago

Notes from OpenJ9 community call:

  • Define baseline target processor features / use target processor features specified in SCC (#7635).
  • Add option to -Xshareclasses to denote "portable" mode.

For base layers of SCC, we want more broadly applicable AOT code.

  • Save options string to SCC, have ability to set up JVM according to stored options in SCC.
  • Force compressed shift amount in "portable" mode.
  • Standardize write barrier (write barrier sequence can vary even in the same GC policy depending on the heap (e.g. fixed heap))

Check:

Make compressedrefs independent of shift. Use a base register rather than a shift.

dsouzai commented 4 years ago

@andrewcraik @mstoodle @vijaysun-omr @fjeremic could you guys take a look at https://github.com/eclipse/openj9/issues/7710#issuecomment-553480522 ; what are your thoughts?

fjeremic commented 4 years ago
  • Force compressed shift amount in "portable" mode.
    • Would require / want to ensure the heap layout so that the heap is in the right location for the required shift.
    • Make AOT load the shift from the J9JavaVM at the start of the compiled body

Can you please elaborate on what this means? What does forcing compressed shift amount mean?

andrewcraik commented 4 years ago

Make AOT load the shift from the J9JavaVM at the start of the compiled body

that is going to be really bad for performance - variable width shifts on x86 are massively slower than fixed shifts. Worse still, most of the fixed shifts are folded into memrefs / leas which wouldn't be possible with the variable variant. Fixing the shift would make a lot more sense - could force 3 shift or 4 shift. Could just tell the GC to use the shift amount from the SCC instead.

Maybe compile in the heaviest barrier and patch parts on load?

Would result in way more barriers in general because some GC modes need a lot more barriers than the standard gencon. While you might be able to patch them out you will still have a loss of icache and footprint growth from all of this stuff.

Probably don't want to support changing gc policies

agreed - for reasonable performance we would want the same policy with the same shift amount to allow the compiler to make reasonable assumptions. If there are good diagnostics from the cache when the user overrides the config on the command line it would make sense to just take the config from the cache unless the user specifies overrides on the command line which could then be vetted for compatibility IMO.

Make compressedrefs independent of shift. Use a base register rather than a shift.

This is a lot of work for the x codegen and likely going to have a very bad impact on perf - I don't necessarily agree that this is the right way to go. I think fixing the shift and forcing shifts will give better performance. We only have to worry about 3 vs 4 shift then since that affects object alignment.

Fixing -Xmx / -Xms results in a simpler barrier for gencon, maybe don't use this when doing AOT

Not baking in that assumption seems reasonable to allow flexibility under AOT

dsouzai commented 4 years ago

Can you please elaborate on what this means? What does forcing compressed shift amount mean?

The idea is that regardless of the heap size, we set the shift to be a certain value (if in compressedrefs mode)

andrewcraik commented 4 years ago

There are also the other "big" JIT mode decisions like HCR support that will have to be decided. They can be enabled, but they will have a big impact on performance (esp since OSR isn't support with AOT yet). Similarly, field watch, FSD, etc.

We may also want to make different choices in the virtual guards on methods to reduce the likelihood of guard patching? Something to think about anyway...

fjeremic commented 4 years ago

The idea is that regardless of the heap size, we set the shift to be a certain value (if in compressedrefs mode)

Ok I suppose the question is under the portable SCC mode what do we care about more:

If we fix the compressedrefs shift under the portable SCC mode then AOT code performance will be good (in comparison) however JIT code performance will not be as good since we have an artificially large shift value. I'm also not sure of the performance delta of the GC.

I do agree that fixing the shift value under portable SCC is definitely the easiest thing we can do to get started.

dsouzai commented 4 years ago

There are also the other "big" JIT mode decisions like HCR support that will have to be decided. They can be enabled, but they will have a big impact on performance (esp since OSR isn't support with AOT yet). Similarly, field watch, FSD, etc.

If we don't go down the path described by [1] then what I'm envisioning we could do is what I described in [2]. Until features like OSR, FSD, etc. are enabled in AOT, then I think we should maintain the current approach of not loading AOT in those modes.

[1] https://github.com/eclipse/omr/issues/4518 [2] https://github.com/eclipse/openj9/issues/7635#issuecomment-548056432

dsouzai commented 4 years ago

We had the first portable SCC workgroup meeting today (as decided in [1]). I've updated this PR's description with links to sub issues based on the points brought up where relevant discussions can happen.

[1] https://openj9.slack.com/archives/C8312LCV9/p1574881291118000

DanHeidinga commented 4 years ago

The other issue that was mentioned during the workgroup discussion was that the SCC does timestamp checking for the JARs / modules file on first access.

In Docker, we expect those checks to continue to pass (file timestamps aren't changed on image run) but are more likely to fail if the cache is manually copied.

There are existing options to disable timestamp checking to work around the manual case.

pshipton commented 4 years ago

There are existing options to disable timestamp checking to work around the manual case.

-Xshareclasses: suboption noTimestampChecks.

Using it is not a good idea, if any jar file changes, the changes will be ignored. Better to add support to update the cache with the new timestamps on a machine migration.