eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

ASSERTION FAILED ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject)))) #14243

Closed pshipton closed 2 years ago

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.system_s390x_linux_Release_testList_0/17 - ub20-390-1 ClassLoadingTest_CS_5m_0 -Xgcpolicy:gencon -Xgc:concurrentScavenge

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk11_j9_sanity.system_s390x_linux_Release_testList_0/17/system_test_output.tar.gz

CLT stderr 19:25:12.621 0x14c5c00    j9mm.107    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/Build_JDK11_s390x_linux_Release/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))

FYI https://github.com/eclipse-openj9/openj9/issues/10031 @dmitripivkine

dmitripivkine commented 2 years ago

The reason for assertion is: !j9classloader 0x000003FEDC13DE98 has classloader !j9object 0x0000000023A019D8 not been marked:

> !markmap ismarked 0x0000000023A019D8
Object 0x0000000023A019D8 is not marked

however both it's classes have class objects marked:

> !allclasses | grep -i 3FEDC13DE98
0x0000000001D34300  0x0000000000000258  0x000003FEDC13DE98  0x000003FF0C06CED0  0   net/adoptopenjdk/test/classloading/deadlock/package2/CL2_C
0x0000000001D35700  0x0000000000000250  0x000003FEDC13DE98  0x000003FEE40CF710  0   net/adoptopenjdk/test/classloading/deadlock/package2/CL2_B

!j9class 0x0000000001D34300

Class object
> !markmap ismarked 0x0000000023A65B18
Object 0x0000000023A65B18 is marked

!j9class 0x0000000001D35700

class object 
> !markmap ismarked 0x0000000023A65818
Object 0x0000000023A65818 is marked

It means !j9classloader 0x000003FEDC13DE98 has not been discovered during live classliaders search in Marking phase

dmitripivkine commented 2 years ago

I have a theory, I think the Write Barrier is missed somewhere (most likely in native VM C code) where class object is updated with reference to classloader object:

@tajila Would you please check place where class object classLoader slot is updated with classloader object ref? Is correct macro used?

dmitripivkine commented 2 years ago

There is an example of class object - there are a few slots updated after this object has been marked and scanned by GC (classLoader, module, protectionDomain, initializationLock):

> !j9object 0x0000000023A65818
!J9Object 0x0000000023A65818 {
    struct J9Class* clazz = !j9class 0xFEE000 // java/lang/Class
    Object flags = 0x0000000A;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    J vmRef = 0x0000000001D35700 (offset = 4) (java/lang/Class)
    Ljava/lang/ClassLoader; classLoader = !fj9object 0x23a019d8 (offset = 12) (java/lang/Class) <----- not marked
    Ljava/lang/Module; module = !fj9object 0x23d72108 (offset = 16) (java/lang/Class) <----- not marked
    Ljava/security/ProtectionDomain; protectionDomain = !fj9object 0x23d72138 (offset = 20) (java/lang/Class) <----- not marked
    Ljava/lang/String; classNameString = !fj9object 0x21969448 (offset = 24) (java/lang/Class)
    Ljava/lang/String; fileNameString = !fj9object 0x0 (offset = 28) (java/lang/Class)
    Ljava/lang/String; packageNameString = !fj9object 0x21a28650 (offset = 32) (java/lang/Class)
    Ljava/lang/Class$AnnotationVars; annotationVars = !fj9object 0x0 (offset = 36) (java/lang/Class)
    Ljava/lang/ClassValue$ClassValueMap; classValueMap = !fj9object 0x0 (offset = 40) (java/lang/Class)
    Ljava/lang/Class$EnumVars; enumVars = !fj9object 0x0 (offset = 44) (java/lang/Class)
    Ljava/lang/J9VMInternals$ClassInitializationLock; initializationLock = !fj9object 0x23d72160 (offset = 48) (java/lang/Class) <----- not marked
    Ljava/lang/Object; methodHandleCache = !fj9object 0x0 (offset = 52) (java/lang/Class)
    Ljava/lang/Class$ClassRepositoryHolder; classRepoHolder = !fj9object 0x0 (offset = 56) (java/lang/Class)
    Ljava/lang/Class$AnnotationCache; annotationCache = !fj9object 0x0 (offset = 60) (java/lang/Class)
    Ljava/lang/Class$MetadataCache; metadataCache = !fj9object 0x0 (offset = 64) (java/lang/Class)
    [Ljava/lang/Class; cachedInterfaces = !fj9object 0x0 (offset = 68) (java/lang/Class)
    Ljava/lang/Class; cachedDeclaringClass = !fj9object 0x0 (offset = 72) (java/lang/Class)
    Ljava/lang/Class; cachedEnclosingClass = !fj9object 0x0 (offset = 76) (java/lang/Class)
    Z cachedCheckInnerClassAttr = 0x00000000 (offset = 88) (java/lang/Class)
    Ljava/lang/Class; nestHost = !fj9object 0x0 (offset = 80) (java/lang/Class)
    Ljava/lang/Class$ReflectCache; reflectCache = !fj9object 0x0 (offset = 84) (java/lang/Class)
}
> !markmap ismarked 0x0000000023A65818 <--- class object itself
Object 0x0000000023A65818 is marked

> !markmap ismarked 0x23a019d8 <--- classLoader
Object 0x0000000023A019D8 is not marked
> !markmap ismarked 0x23d72108 <--- module
Object 0x0000000023D72108 is not marked
> !markmap ismarked 0x23d72138 <--- protectionDomain
Object 0x0000000023D72138 is not marked
> !markmap ismarked 0x23d72160 <--- initializationLock
Object 0x0000000023D72160 is not marked

So I see two possibilities here:

tajila commented 2 years ago

class objects are created with NULL at classloader reference (we know it is possible, GC does NULL check for classLoader field

This only happens if the classloader of the class is the bootstrapclassloader

tajila commented 2 years ago

Ill see if this is reproduceable to verify the comment above.

https://openj9-jenkins.osuosl.org/job/Grinder/508/

dmitripivkine commented 2 years ago

I am not sure it is reproducible easily, previous failure occur July 27, 2020 (on ARM). I still have core file from the first failure. Please let me know if you need it. The pattern as I imagine it now is class object has been marked concurrently (it might happen very early in Concurrent Mark as fas as thread paying tax starts with scan of it's own stack and class object is pushed there) but after this when it's slots are updated has not been rescanned. It can happen if barrier code has not been called (I don't see how due VM code uses proper macros) or somehow does not do right thing (?).

dmitripivkine commented 2 years ago

I discovered strange pattern.

Mark map for heap range 0x0000000023A64800 - 0x0000000023A6A400 size 0x5C00 (23552)

> !j9x 0x3FF81428900,200
0x3FF81428900 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF81428910 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF81428920 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428930 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428940 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428950 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428960 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428970 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428980 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428990 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF814289A0 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF814289B0 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF814289C0 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF814289D0 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF814289E0 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF814289F0 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A00 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A10 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A20 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A30 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A40 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A50 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A60 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A70 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A80 :  0000000800000008 0000000800000008 [ ................ ]
0x3FF81428A90 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF81428AA0 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF81428AB0 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF81428AC0 :  0000000000000000 0000000000000000 [ ................ ]

Each bit 1 in this range points to alive object. And each of them (92 total) are Class objects, marked but not scanned

dmitripivkine commented 2 years ago

Grinders: https://openj9-jenkins.osuosl.org/job/Grinder/652/ https://openj9-jenkins.osuosl.org/job/Grinder/653/

dmitripivkine commented 2 years ago

I missed the fact (most likely irrelevant) zLinus failure occurred on machine with Guarded Storage HW support

JasonFengJ9 commented 2 years ago

aarch64_mac - job/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/19/(macaarch64rt3)

openjdk version "18-ea" 2022-03-22
IBM Semeru Runtime Open Edition 18+37 (build 18-ea+37)
Eclipse OpenJ9 VM 18+37 (build master-1621137e3, JRE 18 Mac OS X aarch64-64-Bit 20220319_28 (JIT enabled, AOT enabled)
OpenJ9   - 1621137e3
OMR      - 644b9078b
JCL      - 155d7b8e1c based on jdk-18+37)
[2022-03-19T17:21:40.940Z] Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
[2022-03-19T17:21:40.940Z] Test start time: 2022/03/19 13:21:39 Eastern Standard Time
[2022-03-19T17:21:40.940Z] Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
[2022-03-19T17:21:40.940Z] Time spent starting: 1 milliseconds
[2022-03-19T17:22:19.881Z] Time spent executing: 37761 milliseconds
[2022-03-19T17:22:19.882Z] Test result: FAILED
[2022-03-19T17:22:19.882Z] Output from test:

[2022-03-19T17:22:19.891Z]  [OUT] Allocating array (iteration 985)
[2022-03-19T17:22:19.891Z]  [OUT] i=0
[2022-03-19T17:22:19.891Z]  [ERR] 17:22:08.430 0x11e887900    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk18/jdk18-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-03-19T17:22:19.891Z]  [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/03/19 13:22:08 - please wait.

[2022-03-19T17:22:19.891Z] >> Success condition was not found: [Output match: Successful test run!]
[2022-03-19T17:22:19.891Z] >> Success condition was not found: [Output match: Cannot load library required by: -Xjit]

[2022-03-19T17:22:19.891Z] cmdLineTester_GCRegressionTests_0_FAILED
pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/14/ - mac11-aarch64-4 cmdLineTester_GCRegressionTests_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/14/functional_test_output.tar.gz

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/03/24 05:42:01 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_OMR_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 58996 milliseconds
Test result: FAILED
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 2142
 [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2142 50 -
 [OUT] Allocating array (iteration 1)
 [OUT] i=0
 [OUT] i=1000
 [OUT] i=2000
...
 [OUT] Allocating array (iteration 2045)
 [OUT] i=0
 [ERR] 18:42:51.729 0x13309d500    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK17_aarch64_mac_OMR/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
dmitripivkine commented 2 years ago

I can not make jdmpview work on this core, so there is not so much I can do

JasonFengJ9 commented 2 years ago

job/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/21/(macaarch64rt7)

[2022-03-25T06:13:38.381Z] variation: Mode110
[2022-03-25T06:13:38.381Z] JVM_OPTIONS:  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs 

[2022-03-25T06:16:01.200Z] Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
[2022-03-25T06:16:01.200Z] Test start time: 2022/03/25 02:15:52 Eastern Standard Time
[2022-03-25T06:16:01.200Z] Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
[2022-03-25T06:16:01.200Z] Time spent starting: 2 milliseconds
[2022-03-25T06:16:08.896Z] Time spent executing: 14956 milliseconds
[2022-03-25T06:16:08.896Z] Test result: FAILED
[2022-03-25T06:16:08.896Z] Output from test:

[2022-03-25T06:16:08.899Z]  [OUT] i=0
[2022-03-25T06:16:08.899Z]  [ERR] 06:15:58.985 0x122087900    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk18/jdk18-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-03-25T06:16:08.899Z]  [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/03/25 02:15:58 - please wait.
pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/22 - mac11-aarch64-8 cmdLineTester_GCRegressionTests_0 -Xjit -Xgcpolicy:gencon -Xnocompressedrefs

No DDR support on aarch64 mac. https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/22/functional_test_output.tar.gz

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/03/29 14:48:53 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 24923 milliseconds
Test result: FAILED
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 2000
 [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2000 50 -
 [OUT] Allocating array (iteration 1)
 [OUT] i=0
 [OUT] i=1000
 [OUT] Allocating array (iteration 2)
 [OUT] i=0
 [OUT] i=1000
...
 [OUT] Allocating array (iteration 675)
 [OUT] i=0
 [ERR] 03:49:10.171 0x13d89d700    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK18_aarch64_mac_Nightly/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/27 - mac11-aarch64-3 cmdLineTester_GCRegressionTests_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/27/functional_test_output.tar.gz

 [OUT] Allocating array (iteration 2330)
 [OUT] i=0
 [OUT] i=1000
 [OUT] i=2000
 [OUT] Allocating array (iteration 2331)
 [OUT] i=0
 [ERR] 02:23:03.417 0x135890b00    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK17_aarch64_mac_Nightly/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
pshipton commented 2 years ago

@knn-k fyi the more recent failures on amac.

knn-k commented 2 years ago

Not reproduced in a 20x Grinder job with cmdLineTester_GCRegressionTests_0 on AArch64 macOS: https://openj9-jenkins.osuosl.org/job/Grinder/767/

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_aot_Personal_testList_0/2

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_aarch64_mac_Nightly_testList_0/30

dmitripivkine commented 2 years ago

DDR on Mac is not available, so continue investigation from https://github.com/eclipse-openj9/openj9/issues/14243#issuecomment-1013429156

I noticed not only these 92 class objects are marked but correspondent Cards are dirty too:

_cardTableStart = !j9x 0x000003FF80FB2000

object offset in the heap in bytes: 0x23A65B18 - 0x21800000 = 0x2265B18
object offset in the heap in cards: 0x2265B18 / 0x200 = 0x1132D
card address: 0x000003FF80FB2000 + 0x1132D = 0x3FF80FC332D

> !j9x 0x3FF80FC3300,100
0x3FF80FC3300 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF80FC3310 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF80FC3320 :  0000000001010101 0101010101010101 [ ................ ] <--- 46 cards * 2 objects_per_card = 92 objects
0x3FF80FC3330 :  0101010101010101 0101010101010101 [ ................ ]
0x3FF80FC3340 :  0101010101010101 0101010101010101 [ ................ ]
0x3FF80FC3350 :  0101000000000000 0000000000000000 [ ................ ]
0x3FF80FC3360 :  0000000000000000 0000000000000000 [ ................ ]
0x3FF80FC3370 :  0000000000000000 0000000000000000 [ ................ ]
dmitripivkine commented 2 years ago

Another observations for aMac (no DDR, so snap traces only):

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/36

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_aot_Personal_testList_0/7/

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk11_j9_sanity.functional_aarch64_mac_OMR_testList_0/42

JasonFengJ9 commented 2 years ago

Observed at job/Test_openjdk18_j9_sanity.functional_aarch64_mac_testList_0/36/(macaarch64rt6)

[2022-04-29T06:21:03.916Z] variation: Mode110
[2022-04-29T06:21:03.916Z] JVM_OPTIONS:  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs 

openjdk version "18.0.1-ea" 2022-04-19
IBM Semeru Runtime Open Edition 18.0.1+10-202204290535 (build 18.0.1-ea+10-202204290535)
Eclipse OpenJ9 VM 18.0.1+10-202204290535 (build master-c4f72ceaa, JRE 18 Mac OS X aarch64-64-Bit 20220429_47 (JIT enabled, AOT enabled)
OpenJ9   - c4f72ceaa
OMR      - 3dd38046a
JCL      - c4f3f6c82c based on jdk-18.0.1+10)

[2022-04-29T06:25:07.971Z]  [ERR] 06:24:55.058 0x15588c100    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk18/jdk18-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))

[2022-04-29T06:25:07.971Z] cmdLineTester_GCRegressionTests_0_FAILED
dmitripivkine commented 2 years ago

I thought DDR has been enabled for Mac M1 platform but looks like not for this build. Still can not be investigated

knn-k commented 2 years ago

@dmitripivkine Unfortunately, there is a toolchain problem https://github.com/eclipse/omr/issues/6491 in enabling DDR with AArch64 macOS builds.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/50

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_0/51

knn-k commented 2 years ago

DDR on AArch64 macOS will be available soon when #14420 is merged. I am waiting for https://github.com/eclipse/omr/pull/5800 to promote.

pshipton commented 2 years ago

https://github.com/eclipse/omr/pull/5800 is being reverted, see https://github.com/eclipse-openj9/openj9/issues/15104#issuecomment-1136427732

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdknext_j9_sanity.functional_aarch64_mac_Nightly_testList_1/58

pshipton commented 2 years ago

amac DDR is now enabled in future builds.

JasonFengJ9 commented 2 years ago

Observed at JDK17 aarch64_mac (ea platform) job/Test_openjdk17_j9_sanity.functional_aarch64_mac/51/consoleFull(macaarch64rt1)

12:25:11  openjdk version "17.0.4-ea" 2022-07-19
12:25:11  IBM Semeru Runtime Open Edition 17.0.4.0-m1 (build 17.0.4-ea+3)
12:25:11  Eclipse OpenJ9 VM 17.0.4.0-m1 (build v0.33.0-release-c9a377390, JRE 17 Mac OS X aarch64-64-Bit 20220601_59 (JIT enabled, AOT enabled)
12:25:11  OpenJ9   - c9a377390
12:25:11  OMR      - 942daf672
12:25:11  JCL      - 02c2140051b based on jdk-17.0.4+3)
===============================================
Running test cmdLineTester_GCRegressionTests_0 ...
===============================================
cmdLineTester_GCRegressionTests_0 Start Time: Wed Jun  1 14:08:38 2022 Epoch Time (ms): 1654106918737
variation: Mode110
JVM_OPTIONS:  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs 

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/06/01 14:10:53 Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/openjdkbinary/j2sdk-image/Contents/Home/bin/..//bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 1 milliseconds
Time spent executing: 29232 milliseconds
Test result: FAILED
Output from test:

[OUT] i=0
 [ERR] 18:11:01.746 0x13789d500    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
 [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/06/01 14:11:01 - please wait.
 [ERR] JVMDUMP032I JVM requested System dump using '/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/core.20220601.141101.8737.0001.dmp' in response to an event
 [ERR] JVMDUMP010I System dump written to /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/core.20220601.141101.8737.0001.dmp
 [ERR] JVMDUMP032I JVM requested Java dump using '/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/javacore.20220601.141101.8737.0002.txt' in response to an event
 [ERR] JVMDUMP010I Java dump written to /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/javacore.20220601.141101.8737.0002.txt
 [ERR] JVMDUMP032I JVM requested Snap dump using '/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/Snap.20220601.141101.8737.0003.trc' in response to an event
 [ERR] JVMDUMP010I Snap dump written to /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/aqa-tests/TKG/output_1654106078320/cmdLineTester_GCRegressionTests_0/Snap.20220601.141101.8737.0003.trc
 [ERR] JVMDUMP013I Processed dump event "traceassert", detail "".
>> Success condition was not found: [Output match: Successful test run!]
>> Success condition was not found: [Output match: Cannot load library required by: -Xjit]
knn-k commented 2 years ago

DDR is enabled in this AArch64 macOS runtime. Artifacts including the core file are available at job/Test_openjdk17_j9_sanity.functional_aarch64_mac_testList_2/48/ for the failure above.

dmitripivkine commented 2 years ago

An assertion has been triggered an attempt to unload !j9classloader 0x0000000137A4BA68 (classLoaderObject = !j9object 0x0000000108601CE8 has not been marked). However for !j9class 0x000000012133EE00 it's classObject = !j9object 0x00000001083DC8D8 has been marked. I found this object in completed Work Packet, so it has been scanned once but looks like classLoader, protectionDomain and module fields were not set yet. These fields are set in internalCreateRAMClassDone() using Macros:

J9VMJAVALANGCLASS_SET_CLASSLOADER()
J9VMJAVALANGCLASS_SET_PROTECTIONDOMAIN()
J9VMJAVALANGCLASS_SET_MODULE()

Looks like Write Barrier has not been called (or does not work somehow). So GC missed mutations in classObject and it has not been rescanned. These Macros are generated (for Linux x86 definitions are located in [extensionsRepo]/build/linux-x86_64-server-release/vm/runtime/j9vmconstantpool.h) and should look like:

#define J9VMJAVALANGCLASS_SET_CLASSLOADER_VM(javaVM, object, value) ((void)0, \
 J9OBJECT_OBJECT_STORE_VM(javaVM, object, J9VMJAVALANGCLASS_CLASSLOADER_OFFSET_VM(javaVM), (value)))

I don't see a reason why these Macros are generated differently in AArch64 macOS, but @knn-k would you please double check?

knn-k commented 2 years ago

@dmitripivkine Thank you for your analysis. I will take a look.

knn-k commented 2 years ago

I see no difference in the definition of the macro in build/macosx-aarch64-server-release/vm/runtime/j9vmconstantpool.h.

#define J9VMJAVALANGCLASS_SET_CLASSLOADER_VM(javaVM, object, value) ((void)0, \
        J9OBJECT_OBJECT_STORE_VM(javaVM, object, J9VMJAVALANGCLASS_CLASSLOADER_OFFSET_VM(javaVM), (value)))
pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/11 - mac11-aarch64-4 cmdLineTester_GCRegressionTests_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/11/functional_test_output.tar.gz

Testing: Unload lots of classes using FVT stress argument to force finalization (with JIT if JIT is Enabled)
Test start time: 2022/06/12 19:25:28 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc -Xgc:fvtest=forceFinalizeClassLoaders   -cp /Users/jenkins/workspace/Test_openjdk18_j9_sanity.functional_aarch64_mac_Release_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 2 milliseconds
Time spent executing: 53399 milliseconds
Test result: FAILED
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 2032
 [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2032 50 -
...
 [OUT] Allocating array (iteration 1833)
 [OUT] i=0
 [ERR] 09:26:12.684 0x12c899100    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK18_aarch64_mac_Release/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/81 cmdLineTester_GCRegressionTests_0

https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/81/functional_test_output.tar.gz

Testing: Unload lots of classes using normal behaviour (with JIT if JIT is Enabled)
Test start time: 2022/06/18 10:55:24 Australian Eastern Standard Time
Running command: "/Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/openjdkbinary/j2sdk-image/bin/java"  -Xjit -Xgcpolicy:gencon -Xnocompressedrefs   -Dcom.ibm.tools.attach.enable=no -Xdump:system:events=systhrow,filter=java/lang/OutOfMemoryError -Xmx8m -Xms8m -Xalwaysclassgc -Xdisableexcessivegc   -cp /Users/jenkins/workspace/Test_openjdk17_j9_sanity.functional_aarch64_mac_Nightly_testList_0/aqa-tests/TKG/../../jvmtest/functional/cmdLineTests/gcRegressionTests/gcRegressionTests.jar com.ibm.tests.garbagecollector.TestClassLoaderMain - - -
Time spent starting: 1 milliseconds
Time spent executing: 19130 milliseconds
Test result: FAILED
Output from test:
 [OUT] Calibrating live object array size...
 [OUT] Set live object array size to 2110
 [OUT] Command line to reproduce this run: com.ibm.tests.garbagecollector.TestClassLoaderMain 2110 50 -
 [OUT] Allocating array (iteration 1)
 [OUT] i=0
 [OUT] i=1000
 [OUT] i=2000
...
 [OUT] Allocating array (iteration 317)
 [OUT] i=0
 [ERR] 00:55:32.825 0x14b07ff00    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/Build_JDK17_aarch64_mac_Nightly/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))

and also https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.functional_aarch64_mac_Nightly_testList_1/79

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.functional_aarch64_mac_aot_Personal_testList_0/22

amicic commented 2 years ago

For the most part, the problem, specifically on Aarch64 OSX, is understood. It seems to be related to weak memory model. A fix is available, proven to help, although still considering various variants of the fix before proceeding with any.

No need to provide references to failures for Aarch64 OSX, but for other platforms, which seems to be a different problem, please do.

amicic commented 2 years ago

Final cleaning consists of 2 type of processing occurring simultaneously (where multiple threads/CPUs may execute each of them, but we consider just 1 for each). T1 (MM_ConcurrentCardTable::finalCleanCards) iterate over card table and for each dirty card (if C ==1) do clear card (C = 0) for each marked object with the card (if M == 1) scan object

T2 (MM_ConcurrentGCIncrementalUpdate::finalCleanCards) in a loop, while scan Q non - empty: (during scanning an object) mark (a child) object (M = 1), push for further scan pop object (typically more work is to be done with other objects between push and pop, but Q is empty) if object is not within a dirty card (if C == 0) scan object

Scan object marks its child objects and pushes them on scan Q for further scanning (generally at a later point, by any thread available)

Problem: every object has to be marked and scanned (ideally exactly once), but we miss to scan an object. Specifically, we miss to scan a class object of a class that have been loaded moments before entering the final STW GC increment. The class has only one object instance that is first time marked during the final card cleaning phase. By missing to scan class object we miss to mark its class loader object and we end up unloading the class, but during unloading we observe the marked class object, what the reported assertion is about.

Object instance gets marked and scanned properly, which means that class object gets marked properly. But class object happened to be within a dirty card (C == 1), and T2 misses to scan it. So it's supposed to be scanned during card cleaning by T1, but it does not happen since during card cleaning T2 does not see the object marked, yet (M is still 0). That normally should not be possible, but running on a weak memory mode, seems the propagation of store operation of marking the class object by T2 to T1 is delayed, or T1 prematurely reads mark bit of the class object, or some other reorder occurs.

A memory barrier prior to reading mark bits by T1 appears to fix it. Either full barrier or even lighter load barrier works.

JasonFengJ9 commented 2 years ago
openjdk version "17.0.4-ea" 2022-07-19
IBM Semeru Runtime Open Edition 17.0.4.0-m2 (build 17.0.4-ea+7)
Eclipse OpenJ9 VM 17.0.4.0-m2 (build openj9-0.33.0-m2, JRE 17 Mac OS X aarch64-64-Bit 20220718_88 (JIT enabled, AOT enabled)
OpenJ9   - e7fc0d2f1
OMR      - e25299388
JCL      - 05652b1ac23 based on jdk-17.0.4+7)

An internal build

[2022-07-12T22:21:38.216Z]  [ERR] 22:21:25.941 0x13c099100    j9mm.107    *   ** ASSERTION FAILED ** at /Users/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-mac-aarch64-openj9/workspace/build/src/openj9/runtime/gc_base/ClassLoaderManager.cpp:396: ((false && (!markMap->isBitSet(classObject))))
[2022-07-12T22:21:38.216Z]  [ERR] JVMDUMP039I Processing dump event "traceassert", detail "" at 2022/07/12 18:21:25 - please wait.
knn-k commented 2 years ago

The build in https://github.com/eclipse-openj9/openj9/issues/14243#issuecomment-1184884329 is from the v0.33.0 branch, and it does not seem to contain the fix https://github.com/eclipse/omr/pull/6605.

dmitripivkine commented 2 years ago

Closing as fixed in https://github.com/eclipse/omr/pull/6605