eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

OpenJDK java/lang/runtime/ReferencedKeyTest java.lang.RuntimeException: missing key #17910

Closed pshipton closed 1 year ago

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_Nightly_testList_0/24/ jdk_lang_0, jdk_lang_j9_0 java/lang/runtime/ReferencedKeyTest.java

20:31:38  java.lang.RuntimeException: missing key
20:31:38    at java.base/java.lang.runtime.ReferencedKeyTest.assertTrue(ReferencedKeyTest.java:52)
20:31:38    at java.base/java.lang.runtime.ReferencedKeyTest.methods(ReferencedKeyTest.java:66)
20:31:38    at java.base/java.lang.runtime.ReferencedKeyTest.mapTest(ReferencedKeyTest.java:62)
20:31:38    at java.base/java.lang.runtime.ReferencedKeyTest.main(ReferencedKeyTest.java:46)
pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_Nightly_testList_1/26/ jdk_lang_1 java/lang/runtime/ReferencedKeyTest.java

23:58:10  java.lang.RuntimeException: missing key
23:58:10    at java.base/java.lang.runtime.ReferencedKeyTest.assertTrue(ReferencedKeyTest.java:52)
23:58:10    at java.base/java.lang.runtime.ReferencedKeyTest.methods(ReferencedKeyTest.java:66)
23:58:10    at java.base/java.lang.runtime.ReferencedKeyTest.mapTest(ReferencedKeyTest.java:62)
23:58:10    at java.base/java.lang.runtime.ReferencedKeyTest.main(ReferencedKeyTest.java:44)
pshipton commented 1 year ago

5 x 100x grinder (stops on failure) https://openj9-jenkins.osuosl.org/job/Grinder/2646/ Failed in every child. # iterations to failure: 21, 39, 83, 10, 20

again with -Xint https://openj9-jenkins.osuosl.org/job/Grinder/2650/ Failed in every child. #iterations to failure: 9, 21, 31, 21, 13

again on xmac with -Xint https://openj9-jenkins.osuosl.org/job/Grinder/2652/ - passed other linux platforms with -Xint https://openj9-jenkins.osuosl.org/job/Grinder/2654/ - passed alinux, plinux, xlinux (ignoring infra failure)

pshipton commented 1 year ago

@TobiAjila fyi

pshipton commented 1 year ago

Try to get javacore, system core on failure. -Xdump:java+system:events=throw,filter=java/lang/RuntimeException,request=exclusive+prepwalk+preempt 1 machine https://openj9-jenkins.osuosl.org/job/Grinder/2659/ - passed 3 machines https://openj9-jenkins.osuosl.org/job/Grinder/2660/ - reproduced the failure in https://openj9-jenkins.osuosl.org/job/Grinder_iteration_2/298/ iteration 89

Artifacts with javacore, core https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder_iteration_2/298/openjdk_test_output.tar.gz

tajila commented 1 year ago

The RefenceKeyMap has twice as many entries as it should have.

> !j9object 0x000000002C6513C0
!J9Object 0x000000002C6513C0 {
    struct J9Class* clazz = !j9class 0x24E800 // java/lang/runtime/ReferencedKeyMap
    Object flags = 0x00000000;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    Z isSoft = 0x00000000 (offset = 12) (java/lang/runtime/ReferencedKeyMap)
    Ljava/util/Map; map = !fj9object 0x2c6513d8 (offset = 4) (java/lang/runtime/ReferencedKeyMap)
    Ljava/lang/ref/ReferenceQueue; stale = !fj9object 0x2c651400 (offset = 8) (java/lang/runtime/ReferencedKeyMap)
}
> !fj9object 0x2c6513d8
!J9Object 0x000000002C6513D8 {
    struct J9Class* clazz = !j9class 0xCF000 // java/util/HashMap
    Object flags = 0x00000000;
    I lockword = 0x00000000 (offset = 0) (java/lang/Object) <hidden>
    Ljava/util/Set; keySet = !fj9object 0x0 (offset = 4) (java/util/AbstractMap)
    Ljava/util/Collection; values = !fj9object 0x0 (offset = 8) (java/util/AbstractMap)
    [Ljava/util/HashMap$Node; table = !fj9object 0x2c650ac0 (offset = 12) (java/util/HashMap)
    Ljava/util/Set; entrySet = !fj9object 0x0 (offset = 16) (java/util/HashMap)
    I size = 0x00000034 (offset = 20) (java/util/HashMap)
    I modCount = 0x00000034 (offset = 24) (java/util/HashMap) //52 but should be 26
    I threshold = 0x00000060 (offset = 28) (java/util/HashMap)
    F loadFactor = 0x3F400000 (offset = 32) (java/util/HashMap)
}

The API relies on the ReferenceQueue; stale ref queue to clean up stale entries. The queue is empty but somehow there are still stale entries.

    public void removeStaleReferences() {
        while (true) {
            Object key = stale.poll();
            if (key == null) {
                break;
            }
            map.remove(key);
        }
    }
pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_Nightly_testList_0/32/

tajila commented 1 year ago

In a successful run 52 items must be in the refQueue (stale)

tajila commented 1 year ago

The test relies on

System.gc();
        sleep();

To force a GC and add objects to refqueues, which is not guaranteed on J9.

tajila commented 1 year ago

@dmitripivkine are objects added to associated reference queues on every GC?

dmitripivkine commented 1 year ago

yes, but Local GC might address references from Nursery only.

tajila commented 1 year ago

Okay so we would need

System.gc();
System.gc();

to trigger a global GC?

dmitripivkine commented 1 year ago

yes, I think so.

pshipton commented 1 year ago

As this is apparently a test problem (limitation of OpenJ9?) rather than a functional problem, removing the blocker label.

pshipton commented 1 year ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk21_j9_sanity.openjdk_s390x_linux_Nightly_testList_1/38/

tajila commented 1 year ago

This seems to be a similar problem to https://github.com/eclipse-openj9/openj9/issues/17723

I noticed

> !j9javavm 0x000003FF9801EB20 | grep processReference
    0x16e8: struct J9ThreadMonitor* processReferenceMonitor = !j9threadmonitor 0x000003FF98009CE8
    0x16f0: UDATA processReferenceActive = 0x0000000000000000 (0)
tajila commented 1 year ago

Probably explains why adding more GCs doesnt solve it

tajila commented 1 year ago

@dmitripivkine Actually the vm->processReferenceActive represents the number of references active being processed. It doesnt represent the references pending processing.

I think we need finalizeListManager->getReferenceCount() to understand this situation better

tajila commented 1 year ago
> !gc_finalizelistmanager 0x000003FF9807E410
GC_FinalizeListManager at 0x3ff9807e410 {
  Fields for MM_Base:
  Fields for MM_BaseVirtual:
    0x8: const U8* _typeId = !j9x 0x000003FF9DA422CC // "GC_FinalizeListManager"
  Fields for GC_FinalizeListManager:
    0x10: class MM_GCExtensions* _extensions = !mm_gcextensions 0x000003FF98036B80
    0x18: struct J9ThreadMonitor* _mutex = !j9threadmonitor 0x000003FF98055AC8
    0x20: struct J9Object* _systemFinalizableObjects = !j9object 0x0000000000000000
    0x28: UDATA _systemFinalizableObjectCount = 0x0000000000000000 (0)
    0x30: struct J9Object* _defaultFinalizableObjects = !j9object 0x0000000000000000
    0x38: UDATA _defaultFinalizableObjectCount = 0x0000000000000000 (0)
    0x40: struct J9Object* _referenceObjects = !j9object 0x000000002C651340 // jdk/internal/ref/CleanerImpl$PhantomCleanableRef
    0x48: UDATA _referenceObjectCount = 0x00000000000000A6 (166) //<--this is the real value
    0x50: struct J9ClassLoader* _classLoaders = !j9classloader 0x0000000000000000
    0x58: UDATA _classLoaderCount = 0x0000000000000000 (0)
}
tajila commented 1 year ago

After taking a brief look at static int J9THREAD_PROC FinalizeMainThread(void *javaVM) it seems like we only consume one job at a time. I think we need a way to force the finalizer thread to empty the queue (or a certain amount).

dmitripivkine commented 1 year ago

After taking a brief look at static int J9THREAD_PROC FinalizeMainThread(void *javaVM) it seems like we only consume one job at a time. I think we need a way to force the finalizer thread to empty the queue (or a certain amount).

There is runFinalization() implementation expected to run Finalization non-stop until queue is empty: https://github.com/eclipse-openj9/openj9/blob/6256ba4ad8e9ea5087fe1c18093b6f0fa3a16ad6/runtime/gc_base/FinalizerSupport.cpp#L899 Is it kind of service you are looking?

tajila commented 1 year ago

Is it kind of service you are looking?

Yes, it fixed the issue I described above. Ill see if it can be used here, https://github.com/eclipse-openj9/openj9/issues/17723.

tajila commented 1 year ago

If we dont see this issue again in the next couple of days we can close it now that the System.gc() changes are in.

pshipton commented 1 year ago

The System.gc() changes are https://github.com/eclipse-openj9/openj9/pull/18044

pshipton commented 1 year ago

Not seen lately, closing for now.