Closed pshipton closed 2 months ago
Creating some builds for triage.
https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/534 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3690/ - cent8-aarch64-2 passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80 ~Implies https://github.com/eclipse-openj9/openj9/pull/19699 is the cause @jdmpapin~
https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/535 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3691/ - cent8-aarch64-3 passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80
https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/536 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3692 - ub22-aarch64-1 failed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80
12:19:57 OpenJDK Runtime Environment (build 1.8.0_422-internal-_2024_06_28_14_55-b00)
12:19:57 Eclipse OpenJ9 VM (build triage28c-2a2df9f1117, JRE 1.8.0 Linux aarch64-64-Bit Compressed References 20240628_469 (JIT enabled, AOT enabled)
12:19:57 OpenJ9 - 2a2df9f1117
12:19:57 OMR - 47a9d248db0
12:19:57 JCL - 667e0c43d2c based on jdk8u422-b04)
@jdmpapin Actually the test failed in the grinder on an older build, which seems weird, but rules out your change.
@tobi this also must be related to -XX:+EnableExtendedHCR
changes.
@jdmpapin I'm not sure we should rule out your change. The grinders are all running with the latest test code that removes -XX:+EnableExtendedHCR
from the failing test. That it fails when Tobi's change is backed out makes sense, then it's passing until your change is introduced.
Building the latest 0.46 (which contains Tobi's change), to confirm if there is a problem with 0.46 or not. https://openj9-jenkins.osuosl.org/job/Pipeline-Release-0.46.0-Build8/5/
OK, I'm checking it out
https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3694 passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80 on 0.46
I'm able to reproduce this on z, ~but it's extremely slippery. It stops happening if I pass a completely irrelevant JIT option, e.g. -Xjit:disableSuffixLogs
~
edit: I was just replacing the count=0
option :facepalm:
OK, so I was able to reproduce the issue reliably on z with compilation limited to the following two methods:
java/lang/ClassLoader.loadClass
java/util/ResourceBundle$RBClassLoader.loadClass
RBClassLoader
has a ClassLoader
field that it delegates to. RBClassLoader.loadClass
in particular delegates to ClassLoader.loadClass
using that field. When compiling that call, the code generator did guarded devirtualization using a single abstract implementer, with RBClassLoader
as the only concrete implementation even though ClassLoader
has other concrete subclasses. It did that because the CH table reported that RBClassLoader
was the only subclass of ClassLoader
. ClassLoader
had been redefined after initializing its other subclasses but before loading RBClassLoader
. Usually on redefinition jitClassesRedefined()
would fix the CH table entries in a way that prevents us from losing track of the subclasses. However, it had the wrong oldClass
because it wasn't expecting classes to be redefined in-place in debug mode, which is possible since #19554, but which couldn't occur in this particular test until #19710 removed -XX:+EnableExtendedHCR
from its command-line options
I have a fix that prevents the bug in my setup. It's just undergoing some more widespread testing
jitClassesRedefined()
would fix the CH table entries in a way that prevents us from losing track of the subclasses
Clarification/correction: It tries to fix the CH table entries so that we don't lose track of subclasses (or any other state) when a class is not redefined in-place. This works by essentially swapping the class info of the original class and the class we'll use going forward. Normally when a class is redefined in-place, the swap does nothing because both are the same class. But because of the unexpected in-place redefinition, we had the original and the newly created class backwards, and so we swapped the info of the class to use going forward (which is the original) with that of the newly created class instead, and that's how the CH table ended up with no subclasses (and later only one subclass) for the original class
There is also a similar failure in TestRefreshGCSpecialClassesCache_BCI_FAST_HCR_JIT_ON_SE80_0 https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk8_j9_extended.functional_aarch64_linux_testList_1/640/
The failure is also occurring in the 0.46 branch on some platforms. @tajila https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_ppc64_aix_Release_testList_0/97 https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_x86-32_windows_Release_testList_0/94 https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_x86-32_windows_Release_testList_1/94
Created https://github.com/eclipse-openj9/openj9/pull/19820 for 0.46 to add -XX:+EnableExtendedHCR
I'm also opening a head stream PR to add -XX:+EnableExtendedHCR
to the tests, to avoid all the failures in the builds. We can revert when the JIT fix is ready.
I've removed the blocker label and moved this to 0.48 but since -XX:-EnableExtendedHCR
is the default in 0.46 and later, users may get a crash rather than the expected message about enabling -XX:+EnableExtendedHCR
.
I think the message mentioning -XX:+EnableExtendedHCR
is only supposed to show when it would allow a redefinition that is currently failing, e.g. because of an added method. But this bug can only occur when the redefinition has succeeded, so the message isn't actually expected in that case (even though the option would be a workaround for the bug)
I suppose it's too late now to get the JIT fix into 0.46?
We've started M2, any further fix needs to be high severity or low risk and we'll have to convince the PM it needs to be included.
Across platforms https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-JDK8/862/
https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_aarch64_linux_Nightly_testList_0/705 - ub22-aarch64-1 TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80_0
-Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:count=0 -Xnocompressedrefs
Changes https://github.com/eclipse-openj9/openj9/compare/2a2df9f1117...22eaa4d435f