Open babsingh opened 3 months ago
@ThanHenderson @hangshao0 @fengxue-IS Please let know your thoughts about the above proposal.
fyi @pshipton @tajila @0xdaryl @vijaysun-omr
the presence of a unique string/key should allow us to search a LF class in the SCC in constant time without any class bytecode comparisons.
The ID string should be unique and consistent for each LF class from run to run, so that we can guaranteed the 1 to 1 mapping between the LF romclass and unique ID string.
We should be able to calculate the ID string given a LF romclass, sth like:
UTF8* getUniqueID(J9ROMClass *LFClass)
@mpirvu @dsouzai @nbhuiyan @jdmpapin fyi
The issue outlined in the Present paragraph also affects Lambda
s, in addition to LambdaForm
s. However, we currently cache Lambda
s in the SCC. The uniqueID
approach should also be extended to Lambda
s as well.
InnerClassLambdaMetafactory.java#L255-L287 and LambdaProxyClassArchive.java#L40-L54.
Only pertain to Lambda
s and not LambdaForm
s. Instead of implementing the addToArchive
and findFromArchive
native methods on LambdaProxyClassArchive
, I suggest we do most of the work within ROMClassBuilder.cpp
.
However, we currently cache Lambdas in the SCC. The uniqueID approach should also be extended to Lambdas as well.
For lambda classes, can we guarantee a uniqueID string that is consistent from run to run ? There is a index number in the lambda class name, Abc$$Lambda$<indexNumber>/0x00....
, which could differ from run to run.
@hangshao0 I am still looking into what we could use within ROMClassBuilder
that is inter-run consistent. Also, in JDK21+ the Lambda
s don't have an index number and I've noticed we don't handle our Lambda
s properly for JDK21+. I will open an issue and PR fixing that.
I've done some work (not yet contributed to OpenJ9) on the lambda naming issue in the context of the JITServer AOT cache, specifically to enable AOT loads of methods that refer to lambda classes in their relocation and validation records. Here is a brief description since it's probably relevant to this issue.
In addition to the non-deterministic index in the name, another problem is that since lambda classes are anonymous, they cannot be looked up by name using existing APIs used by the JIT, e.g. jitGetClassInClassloaderFromUTF8()
. This is not specific to JITServer and also affects AOT loads from the local SCC. Any AOT body that refers to a lambda class cannot be loaded in the current implementation.
JITServer AOT cache already identifies classes by SHA256(ROMClass)
, which gives a unique and consistent ID across JVM instances. To support lambdas, I exclude the non-deterministic part of the class name from the strings in the ROMClass (class name and CP entries) when serializing it to compute its hash, and maintain a mapping at the client JVM (populated in the class load JIT hook) that allows it to lookup the RAMClass for a given deterministic class name prefix and ROMClass hash.
This approach also works for other classes generated at runtime, specifically proxy classes (com/sun/proxy/$Proxy<N>
) and generated reflection accessors (sun/reflect/Generated<...>Accessor<N>
). While these classes are not anonymous, their names being non-deterministic still leads to AOT load failures in the current implementation.
Going forward we should have a common mechanism for dealing with this issue for both local SCC and JITServer. I'm not sure how lambda forms specifically tie into this though. What naming scheme do they use? Are they anonymous?
@AlexeyKhrabrov Thanks for the input.
JITServer AOT cache already identifies classes by SHA256(ROMClass)
Great idea, and the SHA256 approach should work for us in this case too. And in JDK21+ we don't even need to worry about the non-determinism in Lambda
class names when serializing. Since we are planning on storing the unique ID in the constant pool rather than in the ROMClass there should be no problems there.
I'm not sure how lambda forms specifically tie into this though. What naming scheme do they use?
At the time of storing/looking up in SCC, the LambdaForm
naming scheme is
java/lang/invoke/LambdaForm$<method-handle-type>/0x0000000000000000
where <method-handle-type>
is BMH
, DMH
, or MH
.
Are they anonymous?
Yes.
the SHA256 approach should work for us in this case too.
Worth noting that the current implementation requires OpenSSL for that.
And in JDK21+ we don't even need to worry about the non-determinism in
Lambda
class names when serializing.
Still need to support JDKs 8/11/17 for a while, so need both mechanisms.
Since we are planning on storing the unique ID in the constant pool rather than in the ROMClass there should be no problems there.
Just to clarify, the RAM constant pool?
Worth noting that the current implementation requires OpenSSL for that.
Ah, good to know.
Still need to support JDKs 8/11/17 for a while, so need both mechanisms.
Yep, I was just clarifying for future releases.
Just to clarify, the RAM constant pool?
I was thinking in the ROMClass constant pool.
identifies classes by
SHA256(ROMClass)
Does that mean the hash of all bytes that comprise a ROM class?
Does that mean the hash of all bytes that comprise a ROM class?
Yes. More specifically, the serialized ROMClass where all the strings are un-interned and stored in a deterministic order. Which includes the ROM constant pool. See https://github.com/eclipse-openj9/openj9/blob/0aac90fe2465d294403bc4bed1f4a261f8ab9b1e/runtime/compiler/control/JITServerHelpers.cpp#L490-L520 and https://github.com/eclipse-openj9/openj9/blob/0aac90fe2465d294403bc4bed1f4a261f8ab9b1e/runtime/compiler/runtime/JITServerROMClassHash.cpp#L27-L43.
That code depends upon OpenSSL: what's the solution for platforms where OpenSSL is not available?
Just curious, on what platforms is OpenSSL unavailable?
@mpirvu I believe anything that is not listed here: https://github.com/ibmruntimes/openj9-openjdk-jdk21/blob/openj9/closed/openssl.gmk#L63-L89
Edit: here is the OpenSSL supported platform page: https://www.openssl.org/policies/general-supplemental/platforms.html
Just curious, on what platforms is OpenSSL unavailable?
I believe z/OS is one such platform.
JITServer AOT cache already identifies classes by SHA256(ROMClass)
Great idea, and the SHA256 approach should work for us in this case too.
The description mentioned about findFromArchive()
, which I believe is going to find the romClass in the shared cache and then use the romclass to define the Class<?>. Not clear to me how SHA256 would work when we don't have romClass yet at the entry of findFromArchive()
.
@KostasTsiounis What options do we have if we need to compute SHA256 on zOS?
I believe zOS bundles OpenJCEPlus
with the JDK, which in turn uses OCKC
that is an IBM, now open-sourced, library that is built on top of OpenSSL (https://github.com/IBM/OpenCryptographyKitC). The open-source page doesn't mention zOS, but they are producing binaries for it and the signatures are almost identical.
FYI here is a draft PR https://github.com/eclipse-openj9/openj9/pull/19549 with the code that implements what I described in https://github.com/eclipse-openj9/openj9/issues/19371#issuecomment-2078318982.
Present
Currently, we do not store LambdaForm (LF) classes in the SCC. The LF class name is converted into a generic name in
handleAnonClassName
. This leads to LF classes with duplicate class names. Due to the lack of a unique class name, the SCC needs to perform class bytecode comparisons to find the class. This leads to large query times in finding a LF class in the SCC; making it not worthwhile to store them in the SCC.Proposal
Optimally storing the LF classes in the SCC will enhance the startup performance with OJDK MHs. The goal is to close the startup gap between OJDK and OJ9 MHs in JDK8 and JDK11. Startup performance benefits will also be seen in JDK17+.
There are native methods in the LambdaForm generation code, which should allow us to generate a unique string/key to store the LF class in the SCC; the presence of a unique string/key should allow us to search a LF class in the SCC in constant time without any class bytecode comparisons:
Other Considerations