Open cjjdespres opened 1 week ago
@pshipton @tajila The earliest I've been able to find these failures is the nightly test run of June 27, which tested openj9 commits from June 26 and earlier, so I think this may be related to https://github.com/eclipse-openj9/openj9/pull/19710 and maybe https://github.com/eclipse-openj9/openj9/issues/19778. I'm not sure if the test options need to be adjusted as in https://github.com/eclipse-openj9/openj9/pull/19792 or if it's something else.
The version info for the early failure in https://github.com/eclipse-openj9/openj9/issues/19813#issuecomment-2209462326 is:
# DETECTED_JAVA_VERSION=openjdk version "11.0.24-internal" 2024-07-16
# OpenJDK Runtime Environment (build 11.0.24-internal+0-adhoc.jenkins.BuildJDK11ppc64lelinuxjitPersonal)
# Eclipse OpenJ9 VM (build master-979ce920c7c, JRE 11 Linux ppc64le-64-Bit Compressed References 20240627_1784 (JIT enabled, AOT enabled)
# OpenJ9 - 979ce920c7c
# OMR - 47a9d248db0
# JCL - 35dab526d05 based on jdk-11.0.24+6)
@pshipton @tajila The earliest I've been able to find these failures is the nightly test run of June 27, which tested openj9 commits from June 26 and earlier, so I think this may be related to https://github.com/eclipse-openj9/openj9/pull/19710 and maybe https://github.com/eclipse-openj9/openj9/issues/19778. I'm not sure if the test options need to be adjusted as in https://github.com/eclipse-openj9/openj9/pull/19792 or if it's something else.
Ill take a look
Grinder with option added, https://hyc-runtimes-jenkins.swg-devops.com/job/Grinder/41855/ - passed
If it passes that will confirm its related to the HCR changes.
However, the crash is unexpected regardless of options specified so that will need investigation.
Created https://github.com/eclipse-openj9/openj9/pull/19819 to add -XX:+EnableExtendedHCR and https://github.com/eclipse-openj9/openj9/pull/19820 for 0.46
@cjjdespres I still think the crash needs to be investigated. The extended HCR options may have just exposed an issue. Disabling extended HCR prevents redefinition from changing the class shape, any attempt to do so should result in a JVMTI error. The fact that it is crashing means an assumption was broken.
I looked at the stacktrace and I couldn't find anything suspicious, but thats probably because Im not very familiar with that code. Do you know why it is crashing? I think we still need to answer this question.
I agree that this should be investigated. I think this option influences jvmtiRedefineClasses()
and related functions, right? I'm also not at all familiar with that code, or with power.
@hzongaro @zl-wang Do you know who can investigate this?
@tajila could you briefly describe the difference between previous HCR and extended HCR? in particular, method recompilation and its implications. from the stack back-trace, it crashed trying to replace a trampoline (could be many different reasons). presumably, this is across platform, but it happened to be on ppc64le due to certain recompilations by-chance.
@IBMJimmyk i will ask jimmy to take a further look.
@tajila could you briefly describe the difference between previous HCR and extended HCR? in particular, method recompilation and its implications. from the stack back-trace, it crashed trying to replace a trampoline (could be many different reasons). presumably, this is across platform, but it happened to be on ppc64le due to certain recompilations by-chance.
The basic difference is that exended HCR allows one to change the class shape (ie. add/remove fields and methods). This means we dont redefine the J9Class in place, as we do with fast HCR. We create a new J9Class, then we need to do a heap walk to replace all object headers that are affected, update class heirarchies, array component type, etc. This also means that things like the vtable/jitvtable will have a new address, j9methods will also have a new address, etc.
WRT "method recompilation and its implications."
I'll let @gacholio add anything I may have missed.
i.e. as part of the HCR hooked actions, every such-typed existing object is re-constructed?
i.e. as part of the HCR hooked actions, every such-typed existing object is re-constructed?
Yes, the J9Class in the header of affected object intances are updated to the new J9Class in the extended HCR case.
Yes, the J9Class in the header of affected object intances are updated to the new J9Class in the extended HCR case.
I meant more than that. since fields can be added or removed, every such object needs to be reconstructed literally. size can be different. it sounds like a newObject() being required. without new size, is the old-object allowed to continue running?
I meant more than that. since fields can be added or removed, every such object needs to be reconstructed literally. size can be different. it sounds like a newObject() being required. without new size, is the old-object allowed to continue running?
Sorry, you can add/remove static fields, but not instance fields. So only the instance header needs to be updated.
Failure link
A sample, as it's been failing very frequently in the JITServer nightly tests:
https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk17_j9_extended.functional_ppc64le_linux_jit_Personal/991/ https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk11_j9_extended.functional_ppc64le_linux_jit_Personal/1078/
The failure in
cmdLineTester_jvmtitests_hcr_OSRG_nongold_SE80_0
on ppc64le JDK8 in this test from July 4 also fails in rc018 in the same way.Optional info
I initially noticed this in the JITServer nightly tests, but a 50x grinder without JITAAS had a 43/50 failure rate, so this is not JITServer-specific. I have only seen this failure on
ppc64le
so far.Failure output (captured from console output)