Closed pshipton closed 2 years ago
The reason for assertion is: !j9classloader 0x000003FEDC13DE98
has classloader !j9object 0x0000000023A019D8
not been marked:
> !markmap ismarked 0x0000000023A019D8
Object 0x0000000023A019D8 is not marked
however both it's classes have class objects marked:
> !allclasses | grep -i 3FEDC13DE98
0x0000000001D34300 0x0000000000000258 0x000003FEDC13DE98 0x000003FF0C06CED0 0 net/adoptopenjdk/test/classloading/deadlock/package2/CL2_C
0x0000000001D35700 0x0000000000000250 0x000003FEDC13DE98 0x000003FEE40CF710 0 net/adoptopenjdk/test/classloading/deadlock/package2/CL2_B
!j9class 0x0000000001D34300
Class object
> !markmap ismarked 0x0000000023A65B18
Object 0x0000000023A65B18 is marked
!j9class 0x0000000001D35700
class object
> !markmap ismarked 0x0000000023A65818
Object 0x0000000023A65818 is marked
It means !j9classloader 0x000003FEDC13DE98
has not been discovered during live classliaders search in Marking phase
I have a theory, I think the Write Barrier is missed somewhere (most likely in native VM C code) where class object is updated with reference to classloader object:
classLoader
field)@tajila Would you please check place where class object classLoader
slot is updated with classloader object ref? Is correct macro used?
There is an example of class object - there are a few slots updated after this object has been marked and scanned by GC (classLoader
, module
, protectionDomain
, initializationLock
):
> !j9object 0x0000000023A65818
!J9Object 0x0000000023A65818 {
struct J9Class* clazz = !j9class 0xFEE000 // java/lang/Class
Object flags = 0x0000000A;
I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
J vmRef = 0x0000000001D35700 (offset = 4) (java/lang/Class)
Ljava/lang/ClassLoader; classLoader = !fj9object 0x23a019d8 (offset = 12) (java/lang/Class) <----- not marked
Ljava/lang/Module; module = !fj9object 0x23d72108 (offset = 16) (java/lang/Class) <----- not marked
Ljava/security/ProtectionDomain; protectionDomain = !fj9object 0x23d72138 (offset = 20) (java/lang/Class) <----- not marked
Ljava/lang/String; classNameString = !fj9object 0x21969448 (offset = 24) (java/lang/Class)
Ljava/lang/String; fileNameString = !fj9object 0x0 (offset = 28) (java/lang/Class)
Ljava/lang/String; packageNameString = !fj9object 0x21a28650 (offset = 32) (java/lang/Class)
Ljava/lang/Class$AnnotationVars; annotationVars = !fj9object 0x0 (offset = 36) (java/lang/Class)
Ljava/lang/ClassValue$ClassValueMap; classValueMap = !fj9object 0x0 (offset = 40) (java/lang/Class)
Ljava/lang/Class$EnumVars; enumVars = !fj9object 0x0 (offset = 44) (java/lang/Class)
Ljava/lang/J9VMInternals$ClassInitializationLock; initializationLock = !fj9object 0x23d72160 (offset = 48) (java/lang/Class) <----- not marked
Ljava/lang/Object; methodHandleCache = !fj9object 0x0 (offset = 52) (java/lang/Class)
Ljava/lang/Class$ClassRepositoryHolder; classRepoHolder = !fj9object 0x0 (offset = 56) (java/lang/Class)
Ljava/lang/Class$AnnotationCache; annotationCache = !fj9object 0x0 (offset = 60) (java/lang/Class)
Ljava/lang/Class$MetadataCache; metadataCache = !fj9object 0x0 (offset = 64) (java/lang/Class)
[Ljava/lang/Class; cachedInterfaces = !fj9object 0x0 (offset = 68) (java/lang/Class)
Ljava/lang/Class; cachedDeclaringClass = !fj9object 0x0 (offset = 72) (java/lang/Class)
Ljava/lang/Class; cachedEnclosingClass = !fj9object 0x0 (offset = 76) (java/lang/Class)
Z cachedCheckInnerClassAttr = 0x00000000 (offset = 88) (java/lang/Class)
Ljava/lang/Class; nestHost = !fj9object 0x0 (offset = 80) (java/lang/Class)
Ljava/lang/Class$ReflectCache; reflectCache = !fj9object 0x0 (offset = 84) (java/lang/Class)
}
> !markmap ismarked 0x0000000023A65818 <--- class object itself
Object 0x0000000023A65818 is marked
> !markmap ismarked 0x23a019d8 <--- classLoader
Object 0x0000000023A019D8 is not marked
> !markmap ismarked 0x23d72108 <--- module
Object 0x0000000023D72108 is not marked
> !markmap ismarked 0x23d72138 <--- protectionDomain
Object 0x0000000023D72138 is not marked
> !markmap ismarked 0x23d72160 <--- initializationLock
Object 0x0000000023D72160 is not marked
So I see two possibilities here:
J9VMJAVALANGCLASS_SET_CLASSLOADER
should do correct barrier call but may be I missed path somewhere VM does not)J9OBJECT_OBJECT_STORE
macro, barrier calls, dirtying card to cleaning cardclass objects are created with NULL at classloader reference (we know it is possible, GC does NULL check for classLoader field
This only happens if the classloader of the class is the bootstrapclassloader
Ill see if this is reproduceable to verify the comment above.
I am not sure it is reproducible easily, previous failure occur July 27, 2020 (on ARM). I still have core file from the first failure. Please let me know if you need it. The pattern as I imagine it now is class object has been marked concurrently (it might happen very early in Concurrent Mark as fas as thread paying tax starts with scan of it's own stack and class object is pushed there) but after this when it's slots are updated has not been rescanned. It can happen if barrier code has not been called (I don't see how due VM code uses proper macros) or somehow does not do right thing (?).
I discovered strange pattern.
Mark map for heap range 0x0000000023A64800 - 0x0000000023A6A400 size 0x5C00 (23552)
> !j9x 0x3FF81428900,200
0x3FF81428900 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF81428910 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF81428920 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428930 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428940 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428950 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428960 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428970 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428980 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428990 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF814289A0 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF814289B0 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF814289C0 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF814289D0 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF814289E0 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF814289F0 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A00 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A10 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A20 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A30 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A40 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A50 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A60 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A70 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A80 : 0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A90 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF81428AA0 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF81428AB0 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF81428AC0 : 0000000000000000 0000000000000000 [ ................ ]
Each bit 1 in this range points to alive object. And each of them (92 total) are Class objects, marked but not scanned
I missed the fact (most likely irrelevant) zLinus failure occurred on machine with Guarded Storage HW support
aarch64_mac
- job/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/19/
(macaarch64rt3
)
openjdk version "18-ea" 2022-03-22
IBM Semeru Runtime Open Edition 18+37 (build 18-ea+37)
Eclipse OpenJ9 VM 18+37 (build master-1621137e3, JRE 18 Mac OS X aarch64-64-Bit 20220319_28 (JIT enabled, AOT enabled)
OpenJ9 - 1621137e3
OMR - 644b9078b
JCL - 155d7b8e1c based on jdk-18+37)
[2022-03-19T17:21:40.940Z] Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
[2022-03-19T17:21:40.940Z] Test start time: 2022/03/19 13:21:39 Eastern Standard Time
[2022-03-19T17:21:40.940Z] Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
[2022-03-19T17:21:40.940Z] Time spent starting: 1 milliseconds
[2022-03-19T17:22:19.881Z] Time spent executing: 37761 milliseconds
[2022-03-19T17:22:19.882Z] Test result: FAILED
[2022-03-19T17:22:19.882Z] Output from test:
[2022-03-19T17:22:19.891Z] [OUT] Allocating array (iteration 985)
[2022-03-19T17:22:19.891Z] [OUT] i=0
[2022-03-19T17:22:19.891Z] [ERR] 17:22:08.430 0x11e887900 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk18/jdk18-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-03-19T17:22:19.891Z] [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/03/19 13:22:08 - please wait.
[2022-03-19T17:22:19.891Z] >> Success condition was not found: [Output match: Successful test run!]
[2022-03-19T17:22:19.891Z] >> Success condition was not found: [Output match: Cannot load library required by: -Xjit]
[2022-03-19T17:22:19.891Z] cmdLineTester_GCRegressionTests_0_FAILED
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/14/ - mac11-aarch64-4 cmdLineTester_GCRegressionTests_0
Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/03/24 05:42:01 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -cp /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 58996 milliseconds
Test result: FAILED
Output from test:
[OUT] Calibrating live object array size...
[OUT] Set live object array size to 2142
[OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2142 50 -
[OUT] Allocating array (iteration 1)
[OUT] i=0
[OUT] i=1000
[OUT] i=2000
...
[OUT] Allocating array (iteration 2045)
[OUT] i=0
[ERR] 18:42:51.729 0x13309d500 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK17_aarch64_mac_OMR/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
I can not make jdmpview
work on this core, so there is not so much I can do
job/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/21/
(macaarch64rt7
)
[2022-03-25T06:13:38.381Z] variation: Mode110
[2022-03-25T06:13:38.381Z] JVM_OPTIONS: -Xjit -Xgcpolicy:gencon -Xnocompressedrefs
[2022-03-25T06:16:01.200Z] Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
[2022-03-25T06:16:01.200Z] Test start time: 2022/03/25 02:15:52 Eastern Standard Time
[2022-03-25T06:16:01.200Z] Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
[2022-03-25T06:16:01.200Z] Time spent starting: 2 milliseconds
[2022-03-25T06:16:08.896Z] Time spent executing: 14956 milliseconds
[2022-03-25T06:16:08.896Z] Test result: FAILED
[2022-03-25T06:16:08.896Z] Output from test:
[2022-03-25T06:16:08.899Z] [OUT] i=0
[2022-03-25T06:16:08.899Z] [ERR] 06:15:58.985 0x122087900 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk18/jdk18-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-03-25T06:16:08.899Z] [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/03/25 02:15:58 - please wait.
https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/22 - mac11-aarch64-8
cmdLineTester_GCRegressionTests_0 -Xjit -Xgcpolicy:gencon -Xnocompressedrefs
No DDR support on aarch64 mac. https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/22/functional_test_output.tar.gz
Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/03/29 14:48:53 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 24923 milliseconds
Test result: FAILED
Output from test:
[OUT] Calibrating live object array size...
[OUT] Set live object array size to 2000
[OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2000 50 -
[OUT] Allocating array (iteration 1)
[OUT] i=0
[OUT] i=1000
[OUT] Allocating array (iteration 2)
[OUT] i=0
[OUT] i=1000
...
[OUT] Allocating array (iteration 675)
[OUT] i=0
[ERR] 03:49:10.171 0x13d89d700 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK18_aarch64_mac_Nightly/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/27 - mac11-aarch64-3 cmdLineTester_GCRegressionTests_0
[OUT] Allocating array (iteration 2330)
[OUT] i=0
[OUT] i=1000
[OUT] i=2000
[OUT] Allocating array (iteration 2331)
[OUT] i=0
[ERR] 02:23:03.417 0x135890b00 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK17_aarch64_mac_Nightly/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
@knn-k fyi the more recent failures on amac.
Not reproduced in a 20x Grinder job with cmdLineTester_GCRegressionTests_0 on AArch64 macOS: https://openj9-jenkins.osuosl.org/job/Grinder/767/
DDR on Mac is not available, so continue investigation from https://github.com/eclipse-openj9/openj9/issues/14243#issuecomment-1013429156
I noticed not only these 92 class objects are marked but correspondent Cards are dirty too:
_cardTableStart = !j9x 0x000003FF80FB2000
object offset in the heap in bytes: 0x23A65B18 - 0x21800000 = 0x2265B18
object offset in the heap in cards: 0x2265B18 / 0x200 = 0x1132D
card address: 0x000003FF80FB2000 + 0x1132D = 0x3FF80FC332D
> !j9x 0x3FF80FC3300,100
0x3FF80FC3300 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF80FC3310 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF80FC3320 : 0000000001010101 0101010101010101 [ ................ ] <--- 46 cards * 2 objects_per_card = 92 objects
0x3FF80FC3330 : 0101010101010101 0101010101010101 [ ................ ]
0x3FF80FC3340 : 0101010101010101 0101010101010101 [ ................ ]
0x3FF80FC3350 : 0101000000000000 0000000000000000 [ ................ ]
0x3FF80FC3360 : 0000000000000000 0000000000000000 [ ................ ]
0x3FF80FC3370 : 0000000000000000 0000000000000000 [ ................ ]
Another observations for aMac (no DDR, so snap traces only):
Observed at job/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/36/
(macaarch64rt6
)
[2022-04-29T06:21:03.916Z] variation: Mode110
[2022-04-29T06:21:03.916Z] JVM_OPTIONS: -Xjit -Xgcpolicy:gencon -Xnocompressedrefs
openjdk version "18.0.1-ea" 2022-04-19
IBM Semeru Runtime Open Edition 18.0.1+10-202204290535 (build 18.0.1-ea+10-202204290535)
Eclipse OpenJ9 VM 18.0.1+10-202204290535 (build master-c4f72ceaa, JRE 18 Mac OS X aarch64-64-Bit 20220429_47 (JIT enabled, AOT enabled)
OpenJ9 - c4f72ceaa
OMR - 3dd38046a
JCL - c4f3f6c82c based on jdk-18.0.1+10)
[2022-04-29T06:25:07.971Z] [ERR] 06:24:55.058 0x15588c100 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk18/jdk18-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-04-29T06:25:07.971Z] cmdLineTester_GCRegressionTests_0_FAILED
I thought DDR has been enabled for Mac M1 platform but looks like not for this build. Still can not be investigated
@dmitripivkine Unfortunately, there is a toolchain problem https://github.com/eclipse/omr/issues/6491 in enabling DDR with AArch64 macOS builds.
DDR on AArch64 macOS will be available soon when #14420 is merged. I am waiting for https://github.com/eclipse/omr/pull/5800 to promote.
amac DDR is now enabled in future builds.
Observed at JDK17 aarch64_mac
(ea platform)
job/Test_openjdk17_j9_sanity.functional_aarch64_mac/51/consoleFull
(macaarch64rt1
)
12:25:11 openjdk version "17.0.4-ea" 2022-07-19
12:25:11 IBM Semeru Runtime Open Edition 17.0.4.0-m1 (build 17.0.4-ea+3)
12:25:11 Eclipse OpenJ9 VM 17.0.4.0-m1 (build v0.33.0-release-c9a377390, JRE 17 Mac OS X aarch64-64-Bit 20220601_59 (JIT enabled, AOT enabled)
12:25:11 OpenJ9 - c9a377390
12:25:11 OMR - 942daf672
12:25:11 JCL - 02c2140051b based on jdk-17.0.4+3)
===============================================
Running test cmdLineTester_GCRegressionTests_0 ...
===============================================
cmdLineTester_GCRegressionTests_0 Start Time: Wed Jun 1 14:08:38 2022 Epoch Time (ms): 1654106918737
variation: Mode110
JVM_OPTIONS: -Xjit -Xgcpolicy:gencon -Xnocompressedrefs
Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/06/01 14:10:53 Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -cp /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 1 milliseconds
Time spent executing: 29232 milliseconds
Test result: FAILED
Output from test:
[OUT] i=0
[ERR] 18:11:01.746 0x13789d500 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/06/01 14:11:01 - please wait.
[ERR] JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/core.20220601.141101.8737.0001.dmp' in response to an event
[ERR] JVMDUMP010I System dump written to /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/core.20220601.141101.8737.0001.dmp
[ERR] JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/javacore.20220601.141101.8737.0002.txt' in response to an event
[ERR] JVMDUMP010I Java dump written to /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/javacore.20220601.141101.8737.0002.txt
[ERR] JVMDUMP032I JVM requested Snap dump using '/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/Snap.20220601.141101.8737.0003.trc' in response to an event
[ERR] JVMDUMP010I Snap dump written to /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/Snap.20220601.141101.8737.0003.trc
[ERR] JVMDUMP013I Processed dump event "traceassert", detail "".
>> Success condition was not found: [Output match: Successful test run!]
>> Success condition was not found: [Output match: Cannot load library required by: -Xjit]
DDR is enabled in this AArch64 macOS runtime. Artifacts including the core file are available at job/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/48/ for the failure above.
An assertion has been triggered an attempt to unload !j9classloader 0x0000000137A4BA68
(classLoaderObject = !j9object 0x0000000108601CE8
has not been marked). However for !j9class 0x000000012133EE00
it's classObject = !j9object 0x00000001083DC8D8
has been marked. I found this object in completed Work Packet, so it has been scanned once but looks like classLoader
, protectionDomain
and module
fields were not set yet. These fields are set in internalCreateRAMClassDone()
using Macros:
J9VMJAVALANGCLASS_SET_CLASSLOADER()
J9VMJAVALANGCLASS_SET_PROTECTIONDOMAIN()
J9VMJAVALANGCLASS_SET_MODULE()
Looks like Write Barrier has not been called (or does not work somehow). So GC missed mutations in classObject and it has not been rescanned.
These Macros are generated (for Linux x86 definitions are located in [extensionsRepo]/build/linux-x86_64-server-release/vm/runtime/j9vmconstantpool.h
) and should look like:
#define J9VMJAVALANGCLASS_SET_CLASSLOADER_VM(javaVM, object, value) ((void)0, \
J9OBJECT_OBJECT_STORE_VM(javaVM, object, J9VMJAVALANGCLASS_CLASSLOADER_OFFSET_VM(javaVM), (value)))
I don't see a reason why these Macros are generated differently in AArch64 macOS, but @knn-k would you please double check?
@dmitripivkine Thank you for your analysis. I will take a look.
I see no difference in the definition of the macro in build/macosx-aarch64-server-release/vm/runtime/j9vmconstantpool.h.
#define J9VMJAVALANGCLASS_SET_CLASSLOADER_VM(javaVM, object, value) ((void)0, \
J9OBJECT_OBJECT_STORE_VM(javaVM, object, J9VMJAVALANGCLASS_CLASSLOADER_OFFSET_VM(javaVM), (value)))
https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/11 - mac11-aarch64-4 cmdLineTester_GCRegressionTests_0
Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
Test start time: 2022/06/12 19:25:28 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 53399 milliseconds
Test result: FAILED
Output from test:
[OUT] Calibrating live object array size...
[OUT] Set live object array size to 2032
[OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2032 50 -
...
[OUT] Allocating array (iteration 1833)
[OUT] i=0
[ERR] 09:26:12.684 0x12c899100 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK18_aarch64_mac_Release/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/81 cmdLineTester_GCRegressionTests_0
Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/06/18 10:55:24 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java" -Xjit -Xgcpolicy:gencon -Xnocompressedrefs -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -cp /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 1 milliseconds
Time spent executing: 19130 milliseconds
Test result: FAILED
Output from test:
[OUT] Calibrating live object array size...
[OUT] Set live object array size to 2110
[OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2110 50 -
[OUT] Allocating array (iteration 1)
[OUT] i=0
[OUT] i=1000
[OUT] i=2000
...
[OUT] Allocating array (iteration 317)
[OUT] i=0
[ERR] 00:55:32.825 0x14b07ff00 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK17_aarch64_mac_Nightly/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
For the most part, the problem, specifically on Aarch64 OSX, is understood. It seems to be related to weak memory model. A fix is available, proven to help, although still considering various variants of the fix before proceeding with any.
No need to provide references to failures for Aarch64 OSX, but for other platforms, which seems to be a different problem, please do.
Final cleaning consists of 2 type of processing occurring simultaneously (where multiple threads/CPUs may execute each of them, but we consider just 1 for each). T1 (MM_ConcurrentCardTable::finalCleanCards) iterate over card table and for each dirty card (if C ==1) do clear card (C = 0) for each marked object with the card (if M == 1) scan object
T2 (MM_ConcurrentGCIncrementalUpdate::finalCleanCards) in a loop, while scan Q non - empty: (during scanning an object) mark (a child) object (M = 1), push for further scan pop object (typically more work is to be done with other objects between push and pop, but Q is empty) if object is not within a dirty card (if C == 0) scan object
Scan object marks its child objects and pushes them on scan Q for further scanning (generally at a later point, by any thread available)
Problem: every object has to be marked and scanned (ideally exactly once), but we miss to scan an object. Specifically, we miss to scan a class object of a class that have been loaded moments before entering the final STW GC increment. The class has only one object instance that is first time marked during the final card cleaning phase. By missing to scan class object we miss to mark its class loader object and we end up unloading the class, but during unloading we observe the marked class object, what the reported assertion is about.
Object instance gets marked and scanned properly, which means that class object gets marked properly. But class object happened to be within a dirty card (C == 1), and T2 misses to scan it. So it's supposed to be scanned during card cleaning by T1, but it does not happen since during card cleaning T2 does not see the object marked, yet (M is still 0). That normally should not be possible, but running on a weak memory mode, seems the propagation of store operation of marking the class object by T2 to T1 is delayed, or T1 prematurely reads mark bit of the class object, or some other reorder occurs.
A memory barrier prior to reading mark bits by T1 appears to fix it. Either full barrier or even lighter load barrier works.
openjdk version "17.0.4-ea" 2022-07-19
IBM Semeru Runtime Open Edition 17.0.4.0-m2 (build 17.0.4-ea+7)
Eclipse OpenJ9 VM 17.0.4.0-m2 (build openj9-0.33.0-m2, JRE 17 Mac OS X aarch64-64-Bit 20220718_88 (JIT enabled, AOT enabled)
OpenJ9 - e7fc0d2f1
OMR - e25299388
JCL - 05652b1ac23 based on jdk-17.0.4+7)
[2022-07-12T22:21:38.216Z] [ERR] 22:21:25.941 0x13c099100 j9mm.107 * ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-07-12T22:21:38.216Z] [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/07/12 18:21:25 - please wait.
The build in https://github.com/eclipse-openj9/openj9/issues/14243#issuecomment-1184884329 is from the v0.33.0 branch, and it does not seem to contain the fix https://github.com/eclipse/omr/pull/6605.
Closing as fixed in https://github.com/eclipse/omr/pull/6605
https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.system_s390x_linux_Release_testList_0/17 - ub20-390-1 ClassLoadingTest_CS_5m_0
-Xgcpolicy:gencon -Xgc:concurrentScavenge
https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_sanity.system_s390x_linux_Release_testList_0/17/system_test_output.tar.gz
FYI https://github.com/eclipse-openj9/openj9/issues/10031 @dmitripivkine