eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.24k stars 714 forks source link

Optimize storage of LambdaForm classes in the SCC #19371

Open babsingh opened 3 months ago

babsingh commented 3 months ago

Present

Currently, we do not store LambdaForm (LF) classes in the SCC. The LF class name is converted into a generic name in handleAnonClassName. This leads to LF classes with duplicate class names. Due to the lack of a unique class name, the SCC needs to perform class bytecode comparisons to find the class. This leads to large query times in finding a LF class in the SCC; making it not worthwhile to store them in the SCC.

Proposal

Optimally storing the LF classes in the SCC will enhance the startup performance with OJDK MHs. The goal is to close the startup gap between OJDK and OJ9 MHs in JDK8 and JDK11. Startup performance benefits will also be seen in JDK17+.

There are native methods in the LambdaForm generation code, which should allow us to generate a unique string/key to store the LF class in the SCC; the presence of a unique string/key should allow us to search a LF class in the SCC in constant time without any class bytecode comparisons:

    // Translates to JVM_RegisterLambdaProxyClassForArchiving (unimplemented in OpenJ9)
    // Should add the corresponding ROMClass to the SCC
    // The input parameters (except the lambdaProxyClass) can be used for the unique string/key
    // The unique string/key needs to stored in the ROMClass for the SCC to populate their hashtable at startup
    // Two ways to store the unique string/key:
    //   i) Write it in the ROM class's name (consideration: check if there are no dependencies on a hidden/anon ROM class's name
    //  ii) Add a constant pool entry in the generated LF class to store the unique string/key
    private static native void addToArchive(Class<?> caller,
                                            String interfaceMethodName,
                                            MethodType factoryType,
                                            MethodType interfaceMethodType,
                                            MemberName implementationMember,
                                            MethodType dynamicMethodType,
                                            Class<?> lambdaProxyClass);

    // Translates to JVM_LookupLambdaProxyClassFromArchive (unimplemented in OpenJ9)
    // Create the unique string/key using the input parameters
    // Find the ROMClass from the SCC using the unique string/key, internalCreateRAMClassFromROMClass and return the Class<?> object
    private static native Class<?> findFromArchive(Class<?> caller,
                                                   String interfaceMethodName,
                                                   MethodType factoryType,
                                                   MethodType interfaceMethodType,
                                                   MemberName implementationMember,
                                                   MethodType dynamicMethodType);

Other Considerations

babsingh commented 3 months ago

@ThanHenderson @hangshao0 @fengxue-IS Please let know your thoughts about the above proposal.

fyi @pshipton @tajila @0xdaryl @vijaysun-omr

hangshao0 commented 3 months ago

the presence of a unique string/key should allow us to search a LF class in the SCC in constant time without any class bytecode comparisons.

The ID string should be unique and consistent for each LF class from run to run, so that we can guaranteed the 1 to 1 mapping between the LF romclass and unique ID string.

We should be able to calculate the ID string given a LF romclass, sth like: UTF8* getUniqueID(J9ROMClass *LFClass)

vijaysun-omr commented 3 months ago

@mpirvu @dsouzai @nbhuiyan @jdmpapin fyi

ThanHenderson commented 3 months ago

The issue outlined in the Present paragraph also affects Lambdas, in addition to LambdaForms. However, we currently cache Lambdas in the SCC. The uniqueID approach should also be extended to Lambdas as well.

InnerClassLambdaMetafactory.java#L255-L287 and LambdaProxyClassArchive.java#L40-L54.

Only pertain to Lambdas and not LambdaForms. Instead of implementing the addToArchive and findFromArchive native methods on LambdaProxyClassArchive, I suggest we do most of the work within ROMClassBuilder.cpp.

hangshao0 commented 3 months ago

However, we currently cache Lambdas in the SCC. The uniqueID approach should also be extended to Lambdas as well.

For lambda classes, can we guarantee a uniqueID string that is consistent from run to run ? There is a index number in the lambda class name, Abc$$Lambda$<indexNumber>/0x00...., which could differ from run to run.

ThanHenderson commented 3 months ago

@hangshao0 I am still looking into what we could use within ROMClassBuilder that is inter-run consistent. Also, in JDK21+ the Lambdas don't have an index number and I've noticed we don't handle our Lambdas properly for JDK21+. I will open an issue and PR fixing that.

AlexeyKhrabrov commented 3 months ago

I've done some work (not yet contributed to OpenJ9) on the lambda naming issue in the context of the JITServer AOT cache, specifically to enable AOT loads of methods that refer to lambda classes in their relocation and validation records. Here is a brief description since it's probably relevant to this issue.

In addition to the non-deterministic index in the name, another problem is that since lambda classes are anonymous, they cannot be looked up by name using existing APIs used by the JIT, e.g. jitGetClassInClassloaderFromUTF8(). This is not specific to JITServer and also affects AOT loads from the local SCC. Any AOT body that refers to a lambda class cannot be loaded in the current implementation.

JITServer AOT cache already identifies classes by SHA256(ROMClass), which gives a unique and consistent ID across JVM instances. To support lambdas, I exclude the non-deterministic part of the class name from the strings in the ROMClass (class name and CP entries) when serializing it to compute its hash, and maintain a mapping at the client JVM (populated in the class load JIT hook) that allows it to lookup the RAMClass for a given deterministic class name prefix and ROMClass hash.

This approach also works for other classes generated at runtime, specifically proxy classes (com/sun/proxy/$Proxy<N>) and generated reflection accessors (sun/reflect/Generated<...>Accessor<N>). While these classes are not anonymous, their names being non-deterministic still leads to AOT load failures in the current implementation.

Going forward we should have a common mechanism for dealing with this issue for both local SCC and JITServer. I'm not sure how lambda forms specifically tie into this though. What naming scheme do they use? Are they anonymous?

ThanHenderson commented 3 months ago

@AlexeyKhrabrov Thanks for the input.

JITServer AOT cache already identifies classes by SHA256(ROMClass)

Great idea, and the SHA256 approach should work for us in this case too. And in JDK21+ we don't even need to worry about the non-determinism in Lambda class names when serializing. Since we are planning on storing the unique ID in the constant pool rather than in the ROMClass there should be no problems there.

I'm not sure how lambda forms specifically tie into this though. What naming scheme do they use?

At the time of storing/looking up in SCC, the LambdaForm naming scheme is java/lang/invoke/LambdaForm$<method-handle-type>/0x0000000000000000 where <method-handle-type> is BMH, DMH, or MH.

Are they anonymous?

Yes.

AlexeyKhrabrov commented 3 months ago

the SHA256 approach should work for us in this case too.

Worth noting that the current implementation requires OpenSSL for that.

And in JDK21+ we don't even need to worry about the non-determinism in Lambda class names when serializing.

Still need to support JDKs 8/11/17 for a while, so need both mechanisms.

Since we are planning on storing the unique ID in the constant pool rather than in the ROMClass there should be no problems there.

Just to clarify, the RAM constant pool?

ThanHenderson commented 3 months ago

Worth noting that the current implementation requires OpenSSL for that.

Ah, good to know.

Still need to support JDKs 8/11/17 for a while, so need both mechanisms.

Yep, I was just clarifying for future releases.

Just to clarify, the RAM constant pool?

I was thinking in the ROMClass constant pool.

keithc-ca commented 3 months ago

identifies classes by SHA256(ROMClass)

Does that mean the hash of all bytes that comprise a ROM class?

AlexeyKhrabrov commented 3 months ago

Does that mean the hash of all bytes that comprise a ROM class?

Yes. More specifically, the serialized ROMClass where all the strings are un-interned and stored in a deterministic order. Which includes the ROM constant pool. See https://github.com/eclipse-openj9/openj9/blob/0aac90fe2465d294403bc4bed1f4a261f8ab9b1e/runtime/compiler/control/JITServerHelpers.cpp#L490-L520 and https://github.com/eclipse-openj9/openj9/blob/0aac90fe2465d294403bc4bed1f4a261f8ab9b1e/runtime/compiler/runtime/JITServerROMClassHash.cpp#L27-L43.

keithc-ca commented 3 months ago

That code depends upon OpenSSL: what's the solution for platforms where OpenSSL is not available?

mpirvu commented 3 months ago

Just curious, on what platforms is OpenSSL unavailable?

ThanHenderson commented 3 months ago

@mpirvu I believe anything that is not listed here: https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/openj9/closed/openssl.gmk#L63-L89

Edit: here is the OpenSSL supported platform page: https://www.openssl.org/policies/general-supplemental/platforms.html

keithc-ca commented 3 months ago

Just curious, on what platforms is OpenSSL unavailable?

I believe z/OS is one such platform.

hangshao0 commented 3 months ago

JITServer AOT cache already identifies classes by SHA256(ROMClass)

Great idea, and the SHA256 approach should work for us in this case too.

The description mentioned about findFromArchive(), which I believe is going to find the romClass in the shared cache and then use the romclass to define the Class<?>. Not clear to me how SHA256 would work when we don't have romClass yet at the entry of findFromArchive().

mpirvu commented 3 months ago

@KostasTsiounis What options do we have if we need to compute SHA256 on zOS?

KostasTsiounis commented 3 months ago

I believe zOS bundles OpenJCEPlus with the JDK, which in turn uses OCKC that is an IBM, now open-sourced, library that is built on top of OpenSSL (https://github.com/IBM/OpenCryptographyKitC). The open-source page doesn't mention zOS, but they are producing binaries for it and the signatures are almost identical.

AlexeyKhrabrov commented 2 months ago

FYI here is a draft PR https://github.com/eclipse-openj9/openj9/pull/19549 with the code that implements what I described in https://github.com/eclipse-openj9/openj9/issues/19371#issuecomment-2078318982.