harryyu1994 commented 4 years ago

We need to disable Portable AOT by default in containers if the user has explicitly specified a different gc policy. The reason for this is that there is potential overhead for the JVM to be running in Portable AOT mode. When the gc policy is not the default, it's more likely that the Portable AOT will be invalidated so we are going to avoid the overhead by disabling it by default. The user would need to explicitly specify +XX:+PortableSharedCache to enable Portable AOT when a non-default gc policy is specified.

Inside containers:

Portable AOT enabled by default with default gc policy
Portable AOT disabled by default with non-default gc policy

Outside of containers:

Portable AOT disabled by default

harryyu1994 commented 4 years ago

@mpirvu @dmitripivkine FYI

dmitripivkine commented 4 years ago

I believe the current approach make early guess base on options provided it is way to nowhere and pure hack. Now because we don't know gc policy very early (and gc policy even not we want to guess barriers type) we are facing need to add extra parsing for gc policy (and variation of Concurrent Scavenger) etc. I guess would it be better to try different strategy:

add "signature" to Portable AOT including barriers type, compressed/full, CR shift, CPU type etc.
there is no need for this "signature" to be available so early in jvminit but it should be on first use in JIT or GC (for example place where CR shift is selected)
this "signature" can be added to shared classes file but I as understand it handled much later so we need to introduce the way to read "signature" earlier. It might be stored somewhere at the beginning of shared classes file or even it different file (to open-read-close). This way we would base on actual information about context of Portable AOT whatever was used to generate it. @mpirvu Does this suggestion make sense? I afraid we are going deeper and deeper to rabbit hole and there is no good solution around.

mpirvu commented 4 years ago

Even this solution is brittle and can lead to errors down the line. All this stems from the fact that the SCC dll is not loaded early enough, so at the time we need to take a decision about compressedrefs shift value we cannot access any SCC related functions.

In this new solution we need to use VM code (not code from SCC dll) to compute some AOT signature. This new code is decoupled from the code that implements AOT header validation and we need to remember to always keep them in sync: when we add some new code for AOT header validation, this new signature code needs to be updated as well, otherwise we will not be able to use AOT and customers will encounter a silent performance issue. Moreover, this code that computes the signature needs to be called from GC module, after GC determines the types of barriers and what not, and before the compressedref shift value is determined. Let's say that today that requirement is satisfied. In the future we may add another check for a particular GC feature/parameter. The code that determines the value of that new GC feature again needs to stay before the code that determines the value of compressedrefs shift. This creates a dependency that's easy to forget.

Now coming to the place where to store this signature, there are two choices: 1) Somewhere at the beginning of the SCC 2) In a separate file that resides alongside the SCC file

Choice (1) has the same encapsulation problem: code outside SCC needs to know about SCC structure. Choice (2) has a potential consistency problem because the two files need to be kept in sync: both created and destroyed together, same access rights, etc. Some customers that are used to destroying the SCC by deleting the SCC file may be caught off-guard. I'll let @hangshao0 voice his opinion on what is preferable here, but if anything, I would go for (1).

The right thing to do is re-organizing the code such that the value of compressedrefs shift value (and maybe others) is decided in the next init stage, when all dll libraries are loaded. I understand it may not be easy, but it's the cleanest solution of all.

dmitripivkine commented 4 years ago

I agree the right solution is extract part of SCC initialization and execute it earlier. It requires refactoring of SCC code and I am not sure how much work it is. Some fundamental values like compressed refs shift can not be postponed in initialization. It is obvious that it should be initialized before first object allocation. However it is even more strict for CR/Full combo build where there are a lot of classes incapsulate CR shift at initialization time for performance reason.

hangshao0 commented 4 years ago

I'll let @hangshao0 voice his opinion on what is preferable here, but if anything, I would go for (1).

Compared with (2), (1) looks slightly better to me. Both for us and for the users, it is very likely to cause problems to have 2 files with some dependency. One file is always easier to manage than 2 files.

pshipton commented 4 years ago

It should be possible to split up opening the shared cache and reading the entire content, allowing header information to be read earlier, and the rest of the content to be read later.

All the dlls need to be loaded before objects are allocated or classes are created. Looking at the stages, the SCC library is already loaded quite early, I wouldn't expect any object or classes before ABOUT_TO_BOOTSTRAP. I'm not sure when the compressed refs shift is calculated (HEAP_STRUCTURES_INITIALIZED?). It's worth describing when the relevant initialization occurs, and work out how it can be reordered.

The big restriction in changing init order is for the object heap to be allocated as early as possible, before the shared cache and other things are allocated.

If it comes down to it, rather than extracting some part of the SCC library into the VM, I'd fold the entire library into the VM. I believe that is the long term strategy anyway, to only have a single library. However, I'm not convinced this is necessary to solve this.

Looking at -verbose:init I see the following:

Loading libraries at load stage EARLY_LOAD: Running initialization stage PORT_LIBRARY_GUARANTEED Running initialization stage ALL_DEFAULT_LIBRARIES_LOADED Loading libraries at load stage LOAD_BY_DEFAULT:

Loaded library j9jit29
Loaded library j9gc29
Loaded library j9shr29. Init sharedCacheAPI and parse options.

Running initialization stage ALL_LIBRARIES_LOADED Running initialization stage DLL_LOAD_TABLE_FINALIZED

parse for -Xshareclasses:none and unload dll if found

Running initialization stage VM_THREADING_INITIALIZED Running initialization stage HEAP_STRUCTURES_INITIALIZED Running initialization stage ALL_VM_ARGS_CONSUMED Running initialization stage BYTECODE_TABLE_SET Running initialization stage SYSTEM_CLASSLOADER_SET Running initialization stage DEBUG_SERVER_INITIALIZED Running initialization stage TRACE_ENGINE_INITIALIZED Running initialization stage JIT_INITIALIZED

Shared cache init currently occurs here, when heap structures and system classloader are set

Running initialization stage AGENTS_STARTED Initializing Xrun libraries: Running initialization stage ABOUT_TO_BOOTSTRAP

harryyu1994 commented 4 years ago

I believe compressed refs shift is calculated in gcInitializeDefaults(vm) at ALL_LIBRARIES_LOADED.

IDATA 
J9VMDllMain(J9JavaVM* vm, IDATA stage, void* reserved) 
{
    J9VMDllLoadInfo* loadInfo = FIND_DLL_TABLE_ENTRY( THIS_DLL_NAME );
    IDATA rc = J9VMDLLMAIN_OK;

    switch (stage) {
        case PORT_LIBRARY_GUARANTEED:
        case ALL_DEFAULT_LIBRARIES_LOADED:
            break;

        case ALL_LIBRARIES_LOADED:
            /* Note: this must happen now, otherwise -verbose:sizes will not work, as verbose 
             * support is initialized during this stage.
             */
            rc = gcInitializeDefaults(vm);
            break;

        case DLL_LOAD_TABLE_FINALIZED:
        case VM_THREADING_INITIALIZED:
            break;

        case HEAP_STRUCTURES_INITIALIZED:
            rc = gcInitializeHeapStructures(vm);
            break;

harryyu1994 commented 4 years ago

Loading libraries at load stage LOAD_BY_DEFAULT:
- Loaded library j9jit29
- Loaded library j9gc29
- Loaded library j9shr29. Init sharedCacheAPI and parse options.
<<<<<<<<<<<<< we need access to Shared Class Cache here.
Running initialization stage ALL_LIBRARIES_LOADED

hangshao0 commented 4 years ago

Not sure if this has been thought about, info can be stored inside the cache file or in its name. We are already storing info in the cache file name, like JCL version, address mode (64 bit vs 32 bit), compressrefs or non-compressrefs.

mpirvu commented 4 years ago

info can be stored inside the cache file or in its name

That's an interesting idea. So, we need a function/hash that transforms a few VM/GC properties into a (small) string of ASCII characters and a function that parses the cache name to extract these characters. These functions need to stay outside the SCC library.

hangshao0 commented 4 years ago

See shchelp.h and j9shchelp.c

harryyu1994 commented 4 years ago

Not sure if this has been thought about, info can be stored inside the cache file or in its name. We are already storing info in the cache file name, like JCL version, address mode (64 bit vs 32 bit), compressrefs or non-compressrefs.

Correct me if I'm wrong, if we do it this way then we are basically giving each compressed shift a separate Shared Class Cache. (like how we have two separate caches for compressrefs and non-compressrefs.). And I'm guessing other than AOT code the other information like classes information will be duplicated? And if we add gc policy into the equation, then it's a separate SCC for each gc policy.

hangshao0 commented 4 years ago

Yes, it will result in duplication. If we plan to add more info (like GC policy) in the file name in the future, we will end up with more files. It may not be an optimal solution.

harryyu1994 commented 4 years ago

Running initialization stage ALL_LIBRARIES_LOADED
    for library j9jit29...
        completed with rc=0 in 15047 nano sec.
    for library j9bcv29...
        completed with rc=0 in 444 nano sec.
    for library j9gc29...
***********************Compressed Reference Shift is set here********************************
        completed with rc=0 in 1272829 nano sec.
    for library j9dyn29...
        completed with rc=0 in 318 nano sec.
    for library j9vrb29...
        completed with rc=0 in 9598 nano sec.
    for library j9dmp29...
        completed with rc=0 in 953 nano sec.
    for library j9trc29...
        completed with rc=0 in 569 nano sec.
    for library j9shr29...
****************************We are late even if we can somehow initialize the shared cache APIs here*************************
        completed with rc=0 in 63569 nano sec.
    for library j9jvmti29...
        completed with rc=0 in 16258 nano sec.
    for library jclse29...
        completed with rc=0 in 25695 nano sec.
    for library zeroInitStages...
        completed with rc=0 in 336 nano sec.
    for library VMInitStages...

I highly doubt this is going to work but is there a possibility to move j9shr29 before j9gc29 during ALL_LIBRARIES_LOADED and then initialize the shared cache APIs there? (I think we need j9shr_findSharedData) @hangshao0

Just some thoughts, since class data is common to different GC policies and compressed reference shifts but AOT data isn't, should we separate out AOT data to have its own cache? Then different compressed shifts and GC policies can have different AOT data caches and we can use filename hash to figure out whether a certain configuration is available. Also I think if we are allowed to have multiple AOT data caches we won't ever run into compatibility issues. One Class Data Cache + Multiple AOT Data Caches. @mpirvu @pshipton @hangshao0

harryyu1994 commented 4 years ago

If a user starts with a Shared Class Cache but in the next run switch to a different configuration, then AOT code in the Shared Class Cache will be incompatible and the JVM will not do anything to replace the AOT code right? (User needs to either destroy the existing SCC or switch to a different SCC). This won't be a problem if we can have multiple AOT caches for different configurations.

pshipton commented 4 years ago

should we separate out AOT data to have its own cache?

It's possible with a lot of work, but probably not something we want to invest in atm. It's not typical for a user to switch configurations. Perhaps during development in order to find the optimal configuration, but not in production. If the user is running multiple things with different configurations, they should give each one a unique cache name.

If it's really important for some reason to support multiple gc policies and shifts with the same cache name, we can make a separate cache for each combination as Hang suggested.

The shared cache code is loaded in LOAD_BY_DEFAULT, I don't see why it can't open the cache and read the header at this point. We don't want to allocate the entire cache and break up the address space before the object heap is allocated. Can we fit the necessary data in the cache header into a few fields? There is available space in the existing header without changing the structure of the cache. If all we need is the gc policy and compressed reference shift, that fits into 32 bits.

U_32 unused32[5];
U_64 unused64[4];

harryyu1994 commented 4 years ago

@pshipton @hangshao0 Interesting! So the header can be accessed much earlier than the actual content of the Shared Class Cache, I guess they are two separate things? I think the only two pieces of information we need to store right now are gc policy and compressed reference shift. Could you point me to some of the APIs used for accessing the header so I can understand more about it?

pshipton commented 4 years ago

There aren't any APIs to access the cache header, we'll have to add something. Internally to the impl getMmapHeaderFieldOffsetForGen() is an example of reading a field from the header. To access the header earlier requires adding special code to read it without mapping the entire cache into memory. This is feasible because the header is small. Basically we'd just read the first part of the cache file into some allocated memory.

harryyu1994 commented 4 years ago

After looking into it more, I'm again a bit confused about how to make it work. @pshipton So LOAD_BY_DEFAULT will load all the libraries. Then at ALL_LIBRARIES_LOADED we will start to initialize things for each library. and it was in this stage, j9gc sets the compressed reference shift. Yes at this point j9shr library is loaded but it's yet to be initialized (j9shr is always initialized after j9gc at ALL_LIBRARIES_LOADED). Do you have any suggestions to work this around? My original plan was to read the cache header at ALL_LIBRARIES_LOADED and then set a flag bit in vm so that it can be communicated to gc.

Loading libraries at load stage LOAD_BY_DEFAULT:
    Loaded library j9jit29
        completed with rc=1 in 1028812 nano sec.
    Loaded library j9gc29
        completed with rc=1 in 278873 nano sec.
    Loaded library j9vrb29
        completed with rc=1 in 131771 nano sec.
    Loaded library j9shr29
        completed with rc=1 in 154382 nano sec.
    Loaded library j9jvmti29
        completed with rc=1 in 105807 nano sec.
    Loaded library jclse29
        completed with rc=1 in 143477 nano sec.

Checking results for stage LOAD_STAGE

Running initialization stage ALL_LIBRARIES_LOADED
    for library j9jit29...
        completed with rc=0 in 17299 nano sec.
    for library j9bcv29...
        completed with rc=0 in 428 nano sec.
    for library j9gc29...
        completed with rc=0 in 1256727 nano sec.
    for library j9dyn29...
        completed with rc=0 in 319 nano sec.
    for library j9vrb29...
        completed with rc=0 in 11010 nano sec.
    for library j9dmp29...
        completed with rc=0 in 1011 nano sec.
    for library j9trc29...
        completed with rc=0 in 588 nano sec.
    for library j9shr29...
        completed with rc=0 in 27831 nano sec.
    for library j9jvmti29...
        completed with rc=0 in 9767 nano sec.
    for library jclse29...
        completed with rc=0 in 141870 nano sec.
    for library zeroInitStages...
        completed with rc=0 in 124 nano sec.
    for library VMInitStages...

pshipton commented 4 years ago

It requires some shared classes code to be rewritten, new APIs added. Are you asking how that should be done?

harryyu1994 commented 4 years ago

Yes. And in which library we should be adding the code. If we are still adding it to the j9shr library then I failed to see where we can use the API. Can you call the shr API from the gc module? I'm assuming we can't.

harryyu1994 commented 4 years ago

If we were to use the API here it is already too late. j9shr's ALL_LIBRARIES_LOADED is called after j9gc's ALL_LIBRARIES_LOADED.

IDATA J9VMDllMain(J9JavaVM* vm, IDATA stage, void* reserved) 
{
    IDATA returnVal = J9VMDLLMAIN_OK;
    UDATA rc = 0;

    PORT_ACCESS_FROM_JAVAVM(vm);

    if (vm->sharedCacheAPI == NULL) {
        printf ("what stage is this? %ld\n", stage);
        IDATA index = FIND_ARG_IN_VMARGS( OPTIONAL_LIST_MATCH, OPT_XSHARECLASSES, NULL);

        U_64 runtimeFlags = getDefaultRuntimeFlags();

        runtimeFlags |= ((j9shr_isPlatformDefaultPersistent(vm) == TRUE) ? J9SHR_RUNTIMEFLAG_ENABLE_PERSISTENT_CACHE : 0);

        vm->sharedCacheAPI = (J9SharedCacheAPI*)j9mem_allocate_memory(sizeof(J9SharedCacheAPI), J9MEM_CATEGORY_CLASSES);
        if (vm->sharedCacheAPI == NULL) {
            return J9VMDLLMAIN_FAILED;
        }
        memset(vm->sharedCacheAPI, 0, sizeof(J9SharedCacheAPI));
        vm->sharedCacheAPI->softMaxBytes = (U_32)-1;
        vm->sharedCacheAPI->minAOT = -1;
        vm->sharedCacheAPI->maxAOT = -1;

pshipton commented 4 years ago

Basically, shr_init() which is currently called here https://github.com/eclipse/openj9/blob/master/runtime/shared/shrclssup.c#L329 and the startup code initially called here https://github.com/eclipse/openj9/blob/master/runtime/shared_common/shrinit.cpp#L3440 needs to be split into multiple parts.

What happens at startup depends on the shared classes command line options. If a utility like printstats is used (i.e. (RESULT_DO_PRINTSTATS == parseResult), I think the current startup sequence doesn't need to change since the JVM never boots into Java and doesn't need the GC or JIT.

Part 1 runs the startup code up until the internalAttach() https://github.com/eclipse/openj9/blob/master/runtime/shared_common/OSCachemmap.cpp#L298 Running part 1 startup can occur earlier, at this point https://github.com/eclipse/openj9/blob/master/runtime/shared/shrclssup.c#L279 which is during LOAD_BY_DEFAULT. J9VMDllMain() is called during LOAD_BY_DEFAULT when vm->sharedCacheAPI == NULL and then again for each phase of startup.

Instead of doing the mmap of the entire cache https://github.com/eclipse/openj9/blob/master/runtime/shared_common/OSCachemmap.cpp#L992 it needs to just read the header from the file and store it for later use.

We can't mmap the cache during LOAD_BY_DEFAULT because this takes a large chunk of virtual memory and consumes address space before the gc has a chance to allocate the heap. We don't want the shared cache to force the heap into a higher address space and increase the shift, or have the heap fail to allocate because the address space is fragmented.

The vm->sharedCacheAPI structure can be expanded with additional APIs. Since J9JavaVM can be used from anywhere, it can be used from the gc code during ALL_LIBRARIES_LOADED.

Part 2 picks up where Part 1 left off, doing the mmap and the rest of the shared cache init.

harryyu1994 commented 4 years ago

Part 1 runs the startup code up until the internalAttach() https://github.com/eclipse/openj9/blob/master/runtime/shared_common/OSCachemmap.cpp#L298 Running part 1 startup can occur earlier, at this point https://github.com/eclipse/openj9/blob/master/runtime/shared/shrclssup.c#L279 which is during LOAD_BY_DEFAULT. J9VMDllMain() is called during LOAD_BY_DEFAULT when vm->sharedCacheAPI == NULL and then again for each phase of startup.

This was where I originally wanted to place the API but I found out that J9VMDllMain() is not called at LOAD_BY_DEFAULT. When we get into vm->sharedCacheAPI == NULL it's already at ALL_LIBRARIES_LOADED.

harryyu1994 commented 4 years ago

In fact none of the J9VMDllMain() has a LOAD_BY_DEFAULT stage in there so I don't think we ever call J9VMDllMain() for the LOAD_BY_DEFAULT stage.

pshipton commented 4 years ago

Perhaps jvminit.c can be modified to call J9VMDllMain() on the shared classes code during LOAD_BY_DEFAULT.

pshipton commented 4 years ago

Another scarier possibility is to change the order so j9shr is processed before j9gc.

harryyu1994 commented 4 years ago

I'm assuming it's done in static UDATA protectedInitializeJavaVM(J9PortLibrary* portLibrary, void * userData)

saw this code

if (JNI_OK != runLoadStage(vm, LOAD_BY_DEFAULT)) {
    goto error;
}

we can follow this by something like this

if (JNI_OK != (stageRC = runInitializationStage(vm, LOAD_BY_DEFAULT))) {
    goto error;
}

runInitializationStage() calls J9VMDllMain() and runLoadStage() doesn't. That's why we don't handle LOAD_BY_DEFAULT in J9VMDllMain()
so we are turning LOAD_BY_DEFAULT into both a load stage and an init stage. Would this work?

harryyu1994 commented 4 years ago

Or do you think changing the order is the better solution? That will create a dependency which I don't think anyone wants.

harryyu1994 commented 4 years ago

Attempting to introduce a new stage called LOAD_BY_DEFAULT in INIT_STAGE and see what happens

/* Do not have more than 32 stages, and remember to put an entry in
 * jvminit.c:getNameForStage()*/
enum INIT_STAGE {
    PORT_LIBRARY_GUARANTEED,
    ALL_DEFAULT_LIBRARIES_LOADED,
    LOAD_BY_DEFAULT,
    ALL_LIBRARIES_LOADED,
    DLL_LOAD_TABLE_FINALIZED,
    VM_THREADING_INITIALIZED,
    HEAP_STRUCTURES_INITIALIZED,
    ALL_VM_ARGS_CONSUMED,
    BYTECODE_TABLE_SET,
    SYSTEM_CLASSLOADER_SET,
    DEBUG_SERVER_INITIALIZED,
    TRACE_ENGINE_INITIALIZED,
    JIT_INITIALIZED,
    AGENTS_STARTED,
    ABOUT_TO_BOOTSTRAP,
    JCL_INITIALIZED, 
    VM_INITIALIZATION_COMPLETE,
    INTERPRETER_SHUTDOWN,
    LIBRARIES_ONUNLOAD,
    HEAP_STRUCTURES_FREED,
    GC_SHUTDOWN_COMPLETE,
    /* this stage will only be invoked for the jcl shared library when it is being run remotely */ 
    OFFLOAD_JCL_PRECONFIGURE
};

I may want to change the name

pshipton commented 4 years ago

We're in uncharted territory, I'm not sure what will work best we'll just have to experiment. I suspect adding a LOAD_BY_DEFAULT stage might cause a number of problems since the code isn't expecting that. I was thinking you could add some custom code to do a LOAD_BY_DEFAULT for j9shr only.

pshipton commented 4 years ago

do you think changing the order is the better solution?

IDK, what happens if you try it?

That will create a dependency which I don't think anyone wants.

Which is what?

harryyu1994 commented 4 years ago

do you think changing the order is the better solution?

IDK, what happens if you try it?

That will create a dependency which I don't think anyone wants.

Which is what?

Okay I will try it, it will be a cleaner solution if adding a new stage is not an option. I was saying that this makes the j9gc ALL_LIBRARIES_LOADED stage depend on the j9shr ALL_LIBRARIES_LOADED stage, and the same stage should not depend on each other?

pshipton commented 4 years ago

You can try adding LOAD_BY_DEFAULT or some other new stage. I'm sure it can be made to work, just not sure how many changes will be required before it does. It does seem less risky than changing the load order, but I didn't want to rule it out yet before trying anything.

harryyu1994 commented 4 years ago

You can try adding LOAD_BY_DEFAULT or some other new stage. I'm sure it can be made to work, just not sure how many changes will be required before it does. It does seem less risky than changing the load order, but I didn't want to rule it out yet before trying anything.

It seems like something as simple as this made it work. https://github.com/harryyu1994/openj9/commit/50c95e13aa1eb0362572608f5af12ad3cb4dbbef I don't think the other libraries were affected because calling their J9VMDllMain() don't do anything if we don't explicitly add a handler for that stage. (All the library J9VMDllMain() seems to know how to deal with stages that it doesn't have a handler for) We can also do this stage just for j9shr but it makes the code messier and I don't think it's necessary. From my observation not every J9VMDllMain() contains handler for every stage anyways so my vote is with the above change I proposed. I ran sanity test and didn't see any problem with it.

I'm probably going to name the stage POST_LOAD_INIT or INIT_AFTER_LOAD.

pshipton commented 4 years ago

I don't think the other libraries were affected because calling their J9VMDllMain() don't do anything if we don't explicitly add a handler for that stage.

We don't really know what the side affects are, we'll have to look at all the J9VMDllMain() impls, but it's great it just appears to work. The code may not check the stage, but do something else like the shared classes code does (if (vm->sharedCacheAPI == NULL).

harryyu1994 commented 4 years ago

I have linked the code for each component. I'm going to run the code to verify if I'm matching them right. So far no other J9VMDLLMAIN does what j9shr29 does, which is having initialization code that doesn't belong to a stage. Struggled to find the dllmain for j9trc29.

31722 + 473 + 14156 + 311 + 38713 + 426 + 420 + 44696 + 13255 + 322 + 215 = 144709 nano seconds
0.15 ms overhead.

pshipton commented 4 years ago

j9dyn https://github.com/eclipse/openj9/blob/master/runtime/bcutil/bcutil.c#L119

j9trc https://github.com/eclipse/openj9/blob/master/runtime/rastrace/trcengine.c#L165

harryyu1994 commented 4 years ago

Summary of SH_OSCachemmap::startUp() routine

commonStartUp() : initializes things such as cacheName, cacheDir
openCacheFile() : opens the file based on cacheName, cacheDir. Initializes fileHandler
Option 1. Load file into memory (internalAttach) Option 2. Create new file and initialize cache header, also load file into memory (internalAttach)
internalDetach()

Trying to create a routine which avoids loading the entire file like internalAttach() does, but maybe I'm not understanding something correctly. My question is doesn't internalDetach() unmap the file from the memory? It seems that internalDetach() is always called whether we succeed or fail. To me the startup() sequence is really just 1. load file into memory and then 2. unload it from memory. @pshipton

pshipton commented 4 years ago

SH_CompositeCacheImpl::startup() does an _oscache->startup() and then _oscache->attach() to re-map the cache into memory.

https://github.ibm.com/runtimes/openj9/blob/ibm_sdk/runtime/shared_common/CompositeCache.cpp#L918 https://github.ibm.com/runtimes/openj9/blob/ibm_sdk/runtime/shared_common/CompositeCache.cpp#L1047 https://github.ibm.com/runtimes/openj9/blob/ibm_sdk/runtime/shared_common/CompositeCache.cpp#L1129

harryyu1994 commented 3 years ago

Just want to make sure I'm on the right path before I start the implementation
Here's a simplified call graph of the shr_init() routine and my guess on how to split it into 2 parts.
```
Before:
```
shr_init()
- SH_CacheMap::startup()
  - SH_CompositeCacheImpl::startup()
    - SH_OSCachemmap::startup()
      - internalAttach()
      - internalDetach()
    - SH_OSCachemmap::attach()
      - internalAttach()

After:

shr_early_init()
- SH_CacheMap::earlystartup()
  - SH_CompositeCacheImpl::earlystartup()
    - SH_OSCachemmap::earlystartup()
      - internalAttachHeaderOnly() -> needs a header only version, grabs cache header content in this function
      - internalDetachHeaderOnly()
        
        my guess is we can end shr_early_init() right around here
    - SH_OSCachemmap::earlyattach() -> maybe this won't be needed
      - internalAttachHeaderOnly()
shr_late_init()
- SH_CacheMap::startup()
  - SH_CompositeCacheImpl::startup()
    - SH_OSCachemmap::startup() -> this time we try to load the entire file using internalAttach()?
      - internalAttach()
      - internalDetach()
    - SH_OSCachemmap::attach()
      - internalAttach()
I wish there were more documentation on the Shared Class Cache implementation. There seems to be a lot going on with the startup routine.
Do we need to worry about NonPersistent Cache? (I have no idea when we will be using that option. Also is NonPersistent Cache not stored in a file?)
If we have an existing cache, we can read it in shr_early_init(). And if we don't have an existing cache we should not really do anything because we don't have compressedshift and gcpolicy information yet at that point. So I guess earlystartup() will only try to read from existing cache and update certain variables in SharedCacheAPI.
Do you agree that shr_init(), SH_CacheMap::startup(), SH_CompositeCacheImpl::startup(), SH_OSCachemmap::startup() and internalAttach() all need a new version? The new versions are basically going to deal with everything before internalAttach(). @pshipton

pshipton commented 3 years ago

Do we need to worry about NonPersistent Cache?

Yes, shr_init(), SH_CacheMap::startup(), SH_CompositeCacheImpl::startup() is all shared code, then it uses SH_OSCachesysv instead of SH_OSCachemmap. The cache is stored in SysV shared memory, nothing on disk. Except for Windows where it's also a mmaped file but still uses SH_OSCachesysv and a different native implementation. Non-persistent is the default and only option for z/OS, but works on all platforms by using the command line option. For non-persistent to work correctly I expect earlystartup() should open and close the cache, reading the header while it's open. ATM startup() opens the cache and keeps it open.

Do you agree that shr_init(), SH_CacheMap::startup(), SH_CompositeCacheImpl::startup(), SH_OSCachemmap::startup() and internalAttach() all need a new version?

Agreed for everything up to internalAttach(). There are 2 possibilities, I'm not sure if there is an advantage of one over the other. 1) continue to internalAttach() and internalDetach(). While the cache is attached, read the necessary data from the header and store it elsewhere. In the case of an existing cache, internalAttach() / internalDetach() are no longer required during (late)startup() since they've already been done during earlystartup(). 2) introduce internalAttachHeaderOnly() that uses read(fd, ...) to read the header. Not sure internalDetachHeaderOnly() is needed for anything so a better name is along the lines of readHeader().

harryyu1994 commented 3 years ago

SH_CompositeCacheImpl::startup() does an _oscache->startup() and then _oscache->attach() to re-map the cache into memory.

https://github.ibm.com/runtimes/openj9/blob/ibm_sdk/runtime/shared_common/CompositeCache.cpp#L918 https://github.ibm.com/runtimes/openj9/blob/ibm_sdk/runtime/shared_common/CompositeCache.cpp#L1047 https://github.ibm.com/runtimes/openj9/blob/ibm_sdk/runtime/shared_common/CompositeCache.cpp#L1129

Coming back to this previous comment that I still have some confusions with: so the workflow is basically 1. load into memory (_oscache->startup() internalAttach), 2. unload (_oscache->startup() internalDetach), 3. load into memory (oscache->attach()). If this is the case, the first load seems unnecessary to me, why was it needed? Is the purpose of _oscache->startup() trying to see if we can successfully load the SCC into memory?

continue to internalAttach() and internalDetach(). While the cache is attached, read the necessary data from the header and store it elsewhere. In the case of an existing cache, internalAttach() / internalDetach() are no longer required during (late)startup() since they've already been done during earlystartup().

So we can still load everything into memory (call internalAttach()) even in earlystartup()? because we are going to detach right after attach. I think if I can understand the purpose of _oscache->startup() it would clear up a lot of things for me.

pshipton commented 3 years ago

the first load seems unnecessary to me, why was it needed? Is the purpose of _oscache->startup() trying to see if we can successfully load the SCC into memory?

Looking at the code it's reading _dataLength and _dataStart. There is a simple check the cache is valid. It also checks a read-only cache has completed initialization and waits if it is not. I don't see any other obvious reasons, that may be the extent of it.

So we can still load everything into memory (call internalAttach()) even in earlystartup()?

It doesn't load anything into memory except whatever data is read from the cache. Basically just the first page to read some header values. It allocates virtual memory space where the data could reside, but the data is paged in on demand from the memory mapped file.

harryyu1994 commented 3 years ago

I'm going to isolate the problem by implementing the "split" first, not going to add additional functionality nor change the location of the call. The first step I took was to split SH_CompositeCacheImpl::startup. I have ran some local tests and it didn't give any errors. I have also noticed that we have some unit tests in place to test out this API. How do you invoke these tests? I have my changes here: https://github.com/eclipse/openj9/pull/11232. Once this is tested, the next step is to split SH_CacheMap::startup. @pshipton FYI @mpirvu

pshipton commented 3 years ago

FYI https://github.com/eclipse/openj9/blob/master/test/README.md

You need the shrtest native binary, build it with make all, or make test-image in addition to building the JVM. Then the test harness assumes shrtest is in </path/to/jvm>/../native-test-libs/shrtest. It's actually built in build/linux-<plat>-normal-server-release/images/test/openj9, so create a symlink in the images directory ln -s test/openj9 native-test-libs.

prepare the test like this (you only need to do this once, even if the shrtest binary changes):

cd openj9/tests
git clone git@github.com:AdoptOpenJDK/TKG.git
cd TKG
export TEST_JDK_HOME=/path/to/jvm
export JDK_VERSION=11 # adjust for the version you are building
export BUILD_LIST=functional/NativeTest
make compile

Run it like this (from the same TKG directory):

make _shrtest_linux_1 # I assume you're building on Linux. use _shrtest_linux_SE80_1 if you are testing Java 8. The _1 is the mode for compressedrefs

harryyu1994 commented 3 years ago

Summary

Background

Portable AOT is essentially just a customized set of compiler features to be used when we want our AOT code to be portable across different machines. This customized set of compiler features include cpu features, gc policy, cr shift, barrier type, compressed/full.

Problem

During warm runs, we should avoid running in Portable AOT mode if the customized set of compiler features is not compatible with the current JVM instance as Portable AOT mode has performance overhead. It's not trivial to verify compatibility for features like gc policy and cr shift because these things get set very early on in the JVM init process. For example, j9gc29 sets the cr shift at stage ALL_LIBARIES_LOADED. We need to figure out a way to do the compatibility check before this stage.

Solution

Introduce a new stage just before ALL_LIBRARIES_LOADED, see https://github.com/eclipse/openj9/issues/10689#issuecomment-721836454 for more details. During this stage we will read from the SCC header the portable AOT customized feature set.
See https://github.com/eclipse/openj9/issues/10689#issuecomment-721433236 for detailed information on why we need to refactor some of the shared class APIs and see https://github.com/eclipse/openj9/issues/10689#issuecomment-727684340 and https://github.com/eclipse/openj9/issues/10689#issuecomment-728326673 on how to refactor.
See https://github.com/eclipse/openj9/issues/10689#issuecomment-730657902 for the initial stage of the implementation.

mpirvu commented 3 years ago

@awsafsakif could you please continue to work on this issue? Thanks

eclipse-openj9 / openj9

Portable SCC: Disable Portable AOT by default in containers for non-default gc policies #10689

Inside containers:

Outside of containers:

Summary of SH_OSCachemmap::startUp() routine

Summary

Background

Problem

Solution