Open cjjdespres opened 2 weeks ago
Attn @mpirvu.
Issue Number: 20529 Status: Open Recommended Components: comp:vm, comp:build, comp:test Recommended Assignees: tajila, gacholio, babsingh
@mpirvu I did have some verbose log messages during development in a few places, but I haven't added them back in yet. Things related to offset and method tracking, registering dependencies during compilation, that sort of thing. Is there a good verbose option (or options) to put them under?
Now that I've been able to do some testing of this again, I ran acmeair with -Xaot:forceAOT
and found some issues with inlined method validation. One example is java/net/URLConnection.<init>(Ljava/net/URL;)V
. During relocation, I see:
TR_RelocationRecordValidateDefiningClassFromCP 00007F999CA38CD4
size 14 type 67 flags 2 reloFlags 0
Flag: Wide offsets
isStatic true
classID 40
beholderID 30
cpindex 170
reloLocation: from 00007F999CA38CE4 at 00007F9925CBE560 (offset ce39c620)
...
TR_RelocationRecordValidateStaticClassFromCP 00007F32CCA972D2
size 14 type 68 flags 2 reloFlags 0
Flag: Wide offsets
classID 40
beholderID 30
cpindex 170
reloLocation: from 00007F32CCA972E2 at 00007F32BCDD7D80 (offset 356a9f00)
applyRelocationAtAllOffsets: rc = staticClassFromCPValidationFailure
relocateAOTCodeAndData: return code 47
Both classes (ID 30 and 40) are fully initialized, but the relocation failed because getClassOfStaticFromCP
was used on the cp entry at index 170, and that entry was not yet resolved. There are a few intermediate relocation records with beholderID 30
and cpindex 170
, but none of them called anything that would resolve the entry. Some entry resolution might be happening here, but only at the cp index of the method that got inlined and caused these records to be generated (index 171). The AOT load with dependency tracking happened at t=744
, whereas the initial compilation and the AOT load without dependency tracking both happened at roughly t=1200
, so I'm assuming the earlier load is what is causing this.
One thing I did notice is that calls to addDefiningClassFromCPRecord()
in the SVM seem to be preceded by calls to jitCTResolveStaticFieldRefWithMethod()
, and that function looks like it would be able to resolve the cp entry for the class itself (index 170 in this case) at compile time because of its eventual call to resolveStaticFieldRefInto()
. That function is also used in TR_RelocationRecordDataAddress::applyRelocation()
, so I'd imagine it's fine to use during the validation of one of these records, in case the entry isn't resolved yet - these functions just abort the resolution of the entry if J9_RESOLVE_FLAG_JIT_COMPILE_TIME
is set and they encounter any problems.
I'll have to see if this works elsewhere. The number of load failures with -Xaot:forceAOT
and dependency tracking is still lower than without dependency tracking (~100 versus ~350), but I think a lot of the dependency tracking failures are avoidable. Setting the invocation count higher than 0 when all the dependencies are satisfied might also help if jitCTResolveStaticFieldRefWithMethod()
doesn't work, but for this particular method even a count=5
didn't seem to do anything.
(The AOT load failure total with a read-only SCC closer to what's created during open liberty container builds is roughly 10 with dependency tracking and 100 without, for comparison).
After further investigation, I don't think jitCTResolveStaticFieldRefWithMethod
updates what I want. I might be able to resolveClassRef()
before getClassOfStaticFromCP()
, though.
I said in https://github.com/eclipse-openj9/openj9/issues/20669#issuecomment-2494747074 that it looked like the failure here (in https://github.com/eclipse-openj9/openj9/issues/20529#issuecomment-2484538171) was because of getClassFromSignature()
, but I now see that it's this:
that is not interacting well with some of the other validations. The compilation logs for the method with that relo failure look like this:
ClassByNameRecord
_class=0x00000000000F0D00
className=java/net/URL
_classChainOffset=1289193
_beholder=0x0000000000431900
className=java/net/URLConnection
kind=64
id=30
got fieldsig as Ljava/net/URL;
IlGenerator: Generating compressedRefs anchor [00007FA1CA2AF3F0] for node [00007FA1CA2AF350]
got static signature as Z
VirtualMethodFromCPRecord
_method=0x00000000000F0788 <---- this is java/net/URL.getProtocol()Ljava/lang/String;
_beholder=0x0000000000431900
className=java/net/URLConnection
_cpIndex=36
kind=87
id=31
...
[ 3] O^O INLINER: Inlining qwerty java/net/URL.getProtocol()Ljava/lang/String; into java/net/URLConnection.<init>(Ljava/net/URL;)V with a virtual guard kind=DirectMethodGuard type=NonoverriddenTest partialInline=0
addVirtualGuard 00007FA1CA37AC50, guardkind = 11, virtualGuardTestType 3, bc index 0, callee index 1, callNode 00007FA1CA32A000, guardNode 00007FA1CA32D7A0, currentInlinedSiteIdx 1
addVirtualGuard 00007FA1CA37AF50, guardkind = 8, virtualGuardTestType 3, bc index 0, callee index 1, callNode 00007FA1CA32A000, guardNode 00007FA1CA32D930, currentInlinedSiteIdx 1
create storeNode 00007FA1CA32D980 of tempSymRef #447 (possibly for node uncommoning during opcodeExpansion)
create storeNode 00007FA1CA32DA20 of tempSymRef #447 (possibly for node uncommoning during opcodeExpansion)
...
genIL() returned 1
(Building alias info)
DefiningClassFromCPRecord
_class=0x00000000001B1F00
className=java/util/Locale
_beholder=0x00000000000F0D00
className=java/net/URL
_cpIndex=170
_isStatic=true
kind=67
id=40
ClassFromCPRecord
_class=0x00000000001B1F00
className=java/util/Locale
_beholder=0x00000000000F0D00
className=java/net/URL
_cpIndex=171
kind=66
id=40
...
StaticClassFromCPRecord
_class=0x00000000001B1F00
className=java/util/Locale
_beholder=0x00000000000F0D00
className=java/net/URL
_cpIndex=170
kind=68
id=40
I imagine that we used TR_ResolvedRelocatableJ9Method::getClassFromConstantPool()
on the resolved method for java/net/URL.getProtocol()Ljava/lang/String;
, which would then call TR_ResolvedRelocatableJ9Method::validateClassFromConstantPool
, which resulted in the ClassByNameRecord
, and only that record. The issue is that validating that record is done by validateClassByName
, and that record doesn't touch any constant pool. In contrast, a ClassFromCPRecord
will use:
and that getClassFromCP
ends up calling resolveClassRef. That function definitely looks like it can update the ramClassRefWrapper
in a constant pool at compile time, provided that everything is loaded already and has the correct properties. You can see that one was created for cpIndex=171
, which should be constantPool->romConstantPool[170]->classRefCPIndex
.
The issue with all of this is that at compile time, we would have called TR_ResolvedJ9Method::getClassFromConstantPool
to get the class at cp index 170. Since that method uses resolveClassRef
on the constant pool and index directly, it is capable of setting that entry at compile time. The validate*
functions for ClassByNameRecord
and all of the other records involving cp index 170 in this particular compilation avoid calling resolveClassRef
on exactly index 170 in the constant pool of java/net/URL
, so at relocation time we never tried to update this entry, even though we should have been able to do so.
I think we can just add a ClassFromCPRecord
immediately after that ClassByNameRecord
in addClassFromCPRecord
and that will fix this family relocation failures that I've been seeing.
The current approach to AOT loading involves setting a low initial invocation count (the
scount
, default 20) for methods with stored AOT bodies in the SCC. This invocation count guess cannot be too small, or loads would be attempted before all the classes necessary for relocation had been loaded, and it cannot be too large, or loads would be delayed unnecessarily.An alternative is to induce an AOT load when every class necessary for relocating that AOT code has been loaded. This will both increase the reliability of AOT loads and reduce the time between when a load is possible and when a load actually occurs. (Dependency-based loading will not eliminate load failures entirely, as some runtime properties of the classes in question may have changed, but decreasing the failure rate further would require more substantial changes to the structure of relocation itself).
The plan I am following for an initial design has a few components.
J9::AheadOfTimeCompile::processRelocations()
.jitHookInitializeSendTarget()
), their dependencies will be read from the SCC and the current state of those dependencies will be determined. If their dependencies are all satisfied, their initial count will be set to zero. Otherwise, they and their dependencies will be tracked, and their dependencies will be updated as updates to the map in (2) occur. When their dependencies are satisfied, their current count will be reduced to zero and they will be removed from tracking. This will trigger an AOT load the next time they are invoked.Testing of a rough implementation with acmeair is promising - the AOT failure rate is significantly reduced, and AOT loads generally happen much earlier than the
scount
approach. I'll link PRs here as I open them to implement the different pieces of this design.