eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.27k stars 720 forks source link

TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON StackOverflowError ResourceBundle$RBClassLoader.loadClass(ResourceBundle #19777

Closed pshipton closed 2 months ago

pshipton commented 3 months ago

Across platforms https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-JDK8/862/

https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_aarch64_linux_Nightly_testList_0/705 - ub22-aarch64-1 TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80_0 -Xgcpolicy:optthruput -Xdebug -Xrunjdwp:transport=dt_socket,address=8888,server=y,onthrow=no.pkg.foo,launch=echo -Xjit:count=0 -Xnocompressedrefs

22:09:16  [RefreshGCCache_NoBCI_Test] [INFO] Starting RefreshGCCache_NoBCI_Test...
22:09:17  Exception in thread "main" java.lang.StackOverflowError
22:09:17    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:17    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
...
22:09:19    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:19    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:19    at java.util.ResourceBundle$RBClassLoader.loadClass(ResourceBundle.java:512)
22:09:19    at java.util.ResourceBundle$Control.newBundle(ResourceBundle.java:2657)
22:09:19    at java.util.ResourceBundle.loadBundle(ResourceBundle.java:1518)
22:09:19    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1482)
22:09:19    at java.util.ResourceBundle.findBundle(ResourceBundle.java:1436)
22:09:19    at java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1370)
22:09:19    at java.util.ResourceBundle.getBundle(ResourceBundle.java:899)
22:09:19    at sun.util.resources.LocaleData$1.run(LocaleData.java:167)
22:09:19    at sun.util.resources.LocaleData$1.run(LocaleData.java:163)
22:09:19    at java.security.AccessController.doPrivileged(AccessController.java:690)
22:09:19    at sun.util.resources.LocaleData.getBundle(LocaleData.java:163)
22:09:19    at sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:103)
22:09:19    at sun.util.locale.provider.LocaleResources.getTimeZoneNames(LocaleResources.java:262)
22:09:19    at sun.util.locale.provider.TimeZoneNameProviderImpl.getDisplayNameArray(TimeZoneNameProviderImpl.java:124)
22:09:19    at sun.util.locale.provider.TimeZoneNameProviderImpl.getDisplayName(TimeZoneNameProviderImpl.java:99)
22:09:19    at sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter.getName(TimeZoneNameUtility.java:240)
22:09:19    at sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter.getObject(TimeZoneNameUtility.java:198)
22:09:19    at sun.util.locale.provider.TimeZoneNameUtility$TimeZoneNameGetter.getObject(TimeZoneNameUtility.java:184)
22:09:19    at sun.util.locale.provider.LocaleServiceProviderPool.getLocalizedObjectImpl(LocaleServiceProviderPool.java:281)
22:09:19    at sun.util.locale.provider.LocaleServiceProviderPool.getLocalizedObject(LocaleServiceProviderPool.java:265)
22:09:19    at sun.util.locale.provider.TimeZoneNameUtility.retrieveDisplayNamesImpl(TimeZoneNameUtility.java:166)
22:09:19    at sun.util.locale.provider.TimeZoneNameUtility.retrieveDisplayName(TimeZoneNameUtility.java:137)
22:09:19    at java.util.TimeZone.getDisplayName(TimeZone.java:400)
22:09:19    at java.util.Date.toString(Date.java:1045)
22:09:19    at org.testng.reporters.TestHTMLReporter.generateLog(TestHTMLReporter.java:284)
22:09:19    at org.testng.reporters.TestHTMLReporter.onFinish(TestHTMLReporter.java:40)
22:09:19    at org.testng.TestRunner.fireEvent(TestRunner.java:772)
22:09:19    at org.testng.TestRunner.afterRun(TestRunner.java:741)
22:09:19    at org.testng.TestRunner.run(TestRunner.java:509)
22:09:19    at org.testng.SuiteRunner.runTest(SuiteRunner.java:455)
22:09:19    at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:450)
22:09:19    at org.testng.SuiteRunner.privateRun(SuiteRunner.java:415)
22:09:19    at org.testng.SuiteRunner.run(SuiteRunner.java:364)
22:09:19    at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
22:09:19    at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:84)
22:09:19    at org.testng.TestNG.runSuitesSequentially(TestNG.java:1208)
22:09:19    at org.testng.TestNG.runSuitesLocally(TestNG.java:1137)
22:09:19    at org.testng.TestNG.runSuites(TestNG.java:1049)
22:09:19    at org.testng.TestNG.run(TestNG.java:1017)
22:09:19    at org.testng.TestNG.privateMain(TestNG.java:1354)
22:09:19    at org.testng.TestNG.main(TestNG.java:1323)

Changes https://github.com/eclipse-openj9/openj9/compare/2a2df9f1117...22eaa4d435f

pshipton commented 3 months ago

Creating some builds for triage.

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/534 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3690/ - cent8-aarch64-2 passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80 ~Implies https://github.com/eclipse-openj9/openj9/pull/19699 is the cause @jdmpapin~

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/535 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3691/ - cent8-aarch64-3 passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80

https://openj9-jenkins.osuosl.org/job/Pipeline-Build-Test-Personal/536 https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3692 - ub22-aarch64-1 failed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80

12:19:57  OpenJDK Runtime Environment (build 1.8.0_422-internal-_2024_06_28_14_55-b00)
12:19:57  Eclipse OpenJ9 VM (build triage28c-2a2df9f1117, JRE 1.8.0 Linux aarch64-64-Bit Compressed References 20240628_469 (JIT enabled, AOT enabled)
12:19:57  OpenJ9   - 2a2df9f1117
12:19:57  OMR      - 47a9d248db0
12:19:57  JCL      - 667e0c43d2c based on jdk8u422-b04)
pshipton commented 3 months ago

@jdmpapin Actually the test failed in the grinder on an older build, which seems weird, but rules out your change.

pshipton commented 3 months ago

@tobi this also must be related to -XX:+EnableExtendedHCR changes.

pshipton commented 3 months ago

@jdmpapin I'm not sure we should rule out your change. The grinders are all running with the latest test code that removes -XX:+EnableExtendedHCR from the failing test. That it fails when Tobi's change is backed out makes sense, then it's passing until your change is introduced.

pshipton commented 3 months ago

Building the latest 0.46 (which contains Tobi's change), to confirm if there is a problem with 0.46 or not. https://openj9-jenkins.osuosl.org/job/Pipeline-Release-0.46.0-Build8/5/

jdmpapin commented 3 months ago

OK, I'm checking it out

pshipton commented 3 months ago

https://openj9-jenkins.osuosl.org/view/Test/job/Grinder/3694 passed TestRefreshGCSpecialClassesCache_NOBCI_JIT_ON_SE80 on 0.46

jdmpapin commented 3 months ago

I'm able to reproduce this on z, ~but it's extremely slippery. It stops happening if I pass a completely irrelevant JIT option, e.g. -Xjit:disableSuffixLogs~

edit: I was just replacing the count=0 option :facepalm:

jdmpapin commented 3 months ago

OK, so I was able to reproduce the issue reliably on z with compilation limited to the following two methods:

RBClassLoader has a ClassLoader field that it delegates to. RBClassLoader.loadClass in particular delegates to ClassLoader.loadClass using that field. When compiling that call, the code generator did guarded devirtualization using a single abstract implementer, with RBClassLoader as the only concrete implementation even though ClassLoader has other concrete subclasses. It did that because the CH table reported that RBClassLoader was the only subclass of ClassLoader. ClassLoader had been redefined after initializing its other subclasses but before loading RBClassLoader. Usually on redefinition jitClassesRedefined() would fix the CH table entries in a way that prevents us from losing track of the subclasses. However, it had the wrong oldClass because it wasn't expecting classes to be redefined in-place in debug mode, which is possible since #19554, but which couldn't occur in this particular test until #19710 removed -XX:+EnableExtendedHCR from its command-line options

I have a fix that prevents the bug in my setup. It's just undergoing some more widespread testing

jdmpapin commented 3 months ago

jitClassesRedefined() would fix the CH table entries in a way that prevents us from losing track of the subclasses

Clarification/correction: It tries to fix the CH table entries so that we don't lose track of subclasses (or any other state) when a class is not redefined in-place. This works by essentially swapping the class info of the original class and the class we'll use going forward. Normally when a class is redefined in-place, the swap does nothing because both are the same class. But because of the unexpected in-place redefinition, we had the original and the newly created class backwards, and so we swapped the info of the class to use going forward (which is the original) with that of the newly created class instead, and that's how the CH table ended up with no subclasses (and later only one subclass) for the original class

pshipton commented 2 months ago

There is also a similar failure in TestRefreshGCSpecialClassesCache_BCI_FAST_HCR_JIT_ON_SE80_0 https://hyc-runtimes-jenkins.swg-devops.com/job/Test_openjdk8_j9_extended.functional_aarch64_linux_testList_1/640/

pshipton commented 2 months ago

The failure is also occurring in the 0.46 branch on some platforms. @tajila https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_ppc64_aix_Release_testList_0/97 https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_x86-32_windows_Release_testList_0/94 https://openj9-jenkins.osuosl.org/job/Test_openjdk8_j9_extended.functional_x86-32_windows_Release_testList_1/94

pshipton commented 2 months ago

Created https://github.com/eclipse-openj9/openj9/pull/19820 for 0.46 to add -XX:+EnableExtendedHCR

pshipton commented 2 months ago

I'm also opening a head stream PR to add -XX:+EnableExtendedHCR to the tests, to avoid all the failures in the builds. We can revert when the JIT fix is ready.

https://github.com/eclipse-openj9/openj9/pull/19822

pshipton commented 2 months ago

I've removed the blocker label and moved this to 0.48 but since -XX:-EnableExtendedHCR is the default in 0.46 and later, users may get a crash rather than the expected message about enabling -XX:+EnableExtendedHCR.

jdmpapin commented 2 months ago

I think the message mentioning -XX:+EnableExtendedHCR is only supposed to show when it would allow a redefinition that is currently failing, e.g. because of an added method. But this bug can only occur when the redefinition has succeeded, so the message isn't actually expected in that case (even though the option would be a workaround for the bug)

I suppose it's too late now to get the JIT fix into 0.46?

pshipton commented 2 months ago

We've started M2, any further fix needs to be high severity or low risk and we'll have to convince the PM it needs to be included.