JTReg Test Failure - serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorVMEventsTest.java

JasonFengJ9 commented 5 years ago

OpenJ9 does not generate sufficient JVMTI sampling object allocation events when the sampling interval is set to zero

This is a JTreg test failure against JEP 331 implementation.

Failure link

N/A - local runs

Optional info

intermittent failure (no):
new feature test:

Failure output

Exception in thread "main" java.lang.RuntimeException: Could not set the sampler
    at MyPackage.HeapMonitor.sampleEverything(HeapMonitor.java:187)
    at MyPackage.HeapMonitorVMEventsTest.main(HeapMonitorVMEventsTest.java:100)

Notes:

The failed test code snippet is in [1]. This test sets the sampling interval to 0, then attempts to allocate 400 integer arrays int[1000], and expects 400 sampling events which is considered as PASS. Repeat this 10 times until a pass, otherwise a failure. OpenJ9 seems only generate sampling events 12 - 38 for such 400 array allocation, which is quite far from the number of events required.

I built a standalone testcase TestSampleEverything.zip without JVMTI agent, this issue can be demonstrated via following:

java -Xgc:allocationSamplingGranularity=1 -Xtrace:print={j9mm.394-395} -jar TestSampleEverything.zip 2>&1 | grep elements=1000 | wc -l

The expected number of trace output for object size=4008; number of elements=1000 is close to 10 * 400, i.e., 4000, but usually just got below 400.

[1] https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/1c55bba834d441d680298c240ac241ec9aa0b668/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java#L164-L188

fyi: @charliegracie @DanHeidinga

JasonFengJ9 commented 5 years ago

Another JTReg Test Failure - serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java:

Exception in thread "main" java.lang.RuntimeException: Interval average over 10% for interval 1024 -> 281.0, 102400.0
    at MyPackage.HeapMonitorStatIntervalTest.testIntervalOnce(HeapMonitorStatIntervalTest.java:62)
    at MyPackage.HeapMonitorStatIntervalTest.testInterval(HeapMonitorStatIntervalTest.java:77)
    at MyPackage.HeapMonitorStatIntervalTest.main(HeapMonitorStatIntervalTest.java:87)

This test expects 102400 sampling events but just got 281 when the sampling interval is set to 1024. This is probably due to same cause as this issue.

JasonFengJ9 commented 5 years ago

After a bit digging, it appears MM_GCExtensions:oolObjectSamplingBytesGranularity based sampling only applies to out-of-line allocation, and all those successful inline allocation via ObjectAllocationAPI.hpp:inlineAllocateObject() or inlineAllocateIndexableObject() are not traced via -Xgc:allocationSamplingGranularity regardless how small the sampling granularity is set.

JEP331 shares same sampling mechanism hence doesn't sample those inline allocation as this JTreg test discovered.

@charliegracie could you shine some lights on this?

charliegracie commented 5 years ago

The test is bogus in my opinion. This is a "sample" and in such is not guaranteed to fire an exact number of samples. I will look into what we could do but this should not work on HotSpot either. The JEP talks about a random sample base such that a common application pattern would not always give the same samples. If that was true then this test could never work! We should be pushing back on the test implementation. Sadly as this is a JEP I doubt we will be able to provide any input.

charliegracie commented 5 years ago

I will look into what we can do to "improve" the situation but I doubt we will ever have a case where we sample all 400 allocations in the above scenario.

charliegracie commented 5 years ago

Here is a quick hack that may do what you want. FYI I have not compiled or tested this code.

diff --git a/runtime/gc_base/modronapi.cpp b/runtime/gc_base/modronapi.cpp
index 07220c9a3..11a22772d 100644
--- a/runtime/gc_base/modronapi.cpp
+++ b/runtime/gc_base/modronapi.cpp
@@ -810,6 +810,15 @@ j9gc_set_allocation_sampling_interval(J9VMThread *vmThread, UDATA samplingInterv
        samplingInterval = 1;
    }
    extensions->oolObjectSamplingBytesGranularity = samplingInterval;
+#if defined(J9VM_GC_THREAD_LOCAL_HEAP) || defined(J9VM_GC_SEGREGATED_HEAP)
+   MM_GCExtensions *extensions = MM_GCExtensions::getExtensions(currentThread);
+   J9JavaVM *vm = currentThread->javaVM;
+   J9InternalVMFunctions const * const vmFuncs = vm->internalVMFunctions;
+   IDATA const key = extensions->_TLHAsyncCallbackKey;
+   vmFuncs->J9SignalAsyncEvent(vm, NULL, key);
+   vmFuncs->J9CancelAsyncEvent(vm, currentThread, key);
+   memoryManagerTLHAsyncCallbackHandler(currentThread, key, (void*)vm);
+#endif /* defined(J9VM_GC_THREAD_LOCAL_HEAP) || defined(J9VM_GC_SEGREGATED_HEAP) */
 }

 /**
diff --git a/runtime/gc_modron_startup/mgcalloc.cpp b/runtime/gc_modron_startup/mgcalloc.cpp
index 93d0092f8..471a36e9e 100644
--- a/runtime/gc_modron_startup/mgcalloc.cpp
+++ b/runtime/gc_modron_startup/mgcalloc.cpp
@@ -626,6 +626,24 @@ memoryManagerTLHAsyncCallbackHandler(J9VMThread *vmThread, IDATA handlerKey, voi
        Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventNotHooked(vmThread);
        extensions->disableInlineCacheForAllocationThreshold = false;
    }
+
+   if (extensions->doOutOfLineAllocationTrace) {
+       //TODO ADD a new tracepoint
+       //Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventIsHooked(vmThread);
+       if (extensions->isStandardGC() || extensions->isVLHGC()) {
+#if defined(J9VM_GC_THREAD_LOCAL_HEAP)
+           extensions->disableInlineCacheForAllocationThreshold = (extensions->oolObjectSamplingBytesGranularity < extensions->tlhMaximumSize);
+#endif /* defined(J9VM_GC_THREAD_LOCAL_HEAP) */
+       } else if (extensions->isSegregatedHeap()) {
+#if defined(J9VM_GC_SEGREGATED_HEAP)
+           extensions->disableInlineCacheForAllocationThreshold = (extensions->oolObjectSamplingBytesGranularity <= J9VMGC_SIZECLASSES_MAX_SMALL_SIZE_BYTES);
+#endif /* defined(J9VM_GC_SEGREGATED_HEAP) */
+       }
+   } else {
+       //TODO ADD a new tracepoint
+       //Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventNotHooked(vmThread);
+       extensions->disableInlineCacheForAllocationThreshold = false;
+   }

    if (extensions->isStandardGC() || extensions->isVLHGC()) {
 #if defined(J9VM_GC_THREAD_LOCAL_HEAP)

charliegracie commented 5 years ago

Also the sampling still only happens on out of line allocates. The above hack forces out of line allocates for all allocations when the sampling threshold is less than maximum TLH size.

JasonFengJ9 commented 5 years ago

Thanks @charliegracie for your quick response, will give it a try.

JasonFengJ9 commented 5 years ago

The patch doesn't work as expected because of _oolTraceAllocationBytes is only updated for out-of-line allocation [1].

It appears _oolTraceAllocationBytes wasn't updated within https://github.com/eclipse/openj9/blob/c1b8968fc67ddea5ab2e6deea84fd7f36570f0d1/runtime/gc_include/ObjectAllocationAPI.hpp#L74 and https://github.com/eclipse/openj9/blob/c1b8968fc67ddea5ab2e6deea84fd7f36570f0d1/runtime/gc_include/ObjectAllocationAPI.hpp#L163

[1] https://github.com/eclipse/omr/blob/e21bb944bc31b3b3101fa4eb68eb6bc428c913c1/gc/base/TLHAllocationInterface.cpp#L212

P.S. RI does pass MyPackage/HeapMonitorVMEventsTest.java, i.e., it can sample every allocation in this testcase.

charliegracie commented 5 years ago

just because the RI finds all allocations does not mean it is correct re the wording of the specification. Threads are not atomically updated to see the new values as this is a performance nightmare. The proposed changes above rely on async checks to cause the threads to recognize the new values. No JVM, I repeat no JVM, could be expected to have a lightweight allocation monitoring system force threads through an exclusive vm access to update this value. Did the proposed change improve things at all?

The inline call you posted in the previous comment may not have anything to do with this since the JIT can inline allocate as well. The only acceptable solution is to wait for the next TLH refresh fo the thread to obey the new threshold. If this can not how the JEP is worded then it is flawed and we should be pushing back on the JEP or its tests.

JasonFengJ9 commented 5 years ago

Did the proposed change improve things at all?

There is no observable difference.

Allocation and _oolTraceAllocationBytes A printf() within ObjectAllocationAPI.hpp:inlineAllocateIndexableObject() shows most of allocation (size = 1000 for the integer array size) occurred here. MM_EnvironmentBase::_oolTraceAllocationBytes is not updated within this inline allocation method. It is only incremented within mgcalloc.cpp:MM_TLHAllocationInterface::allocateObject().

-Xjit:count=0 appears to take away those inlineAllocateIndexableObject() calls.

So inlineAllocateIndexableObject() (and JIT allocations) don't increase the field _oolTraceAllocationBytes, is this expected or a known limitation?

The cause of insufficient sampling event I think the cause of insufficient sampling event is not due to the new sampling interval value not recognized right away, it is because those inline allocations are not recorded at first place.

JEP and RI behaviours This JEP is optional hence not warranted by any compatibility tests. I just use this JTReg test to match RI behaviours as much as possible. For this particular testcase, RI passes consistently, i.e., it can sample each allocation while OpenJ9 hits only around 1/10.

charliegracie commented 5 years ago

Hmm looking at your test case you are not changing the value of _oolTraceAllocationBytes via the GC API during runtime you are specifying it via the command line. I thought the JVMTI code allowed this value to be changed dynamically at runtime. Does it? My changes above will only have an impact if you call the j9gc_set_allocation_sampling_interval after your application threads are up and running. It is not the final correct solution but I was suggesting it as a test. Let me see if I can modify it to handle your current use case as well.

charliegracie commented 5 years ago

Also you are correct the code currently only accounts for objects allocated ool. It should also be considering TLH refreshes as well. Looking into this now

JasonFengJ9 commented 5 years ago

Hmm looking at your test case you are not changing the value of _oolTraceAllocationBytes via the GC API during runtime you are specifying it via the command line.

The JTReg testcase does call j9gc_set_allocation_sampling_interval [1]. Since this JEP implementation shares same event trigger mechanism with GC allocation sampling via -Xgc:allocationSamplingGranularity, the attached testcase is to demonstrate the sampling pattern without JVMTI natives.

I did test the patch with original JTReg JVMTI testcase and got similar results as before.

[1] https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/ad988f03e08bf150c97380e32741409e4e597432/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java#L166

charliegracie commented 5 years ago

@JasonFengJ9 is there a JVMTI API for modifying the sampling threshold?

JasonFengJ9 commented 5 years ago

@charliegracie yes, the JVMTI API mentioned is https://github.com/eclipse/openj9/blob/ac4b60c5af219c51fd246377e65f390c626d9a7a/runtime/include/jvmti.h.m4#L1172 It is implemented at https://github.com/eclipse/openj9/blob/ac4b60c5af219c51fd246377e65f390c626d9a7a/runtime/jvmti/jvmtiMemory.c#L85-L86 which eventually invokes https://github.com/eclipse/openj9/blob/ac4b60c5af219c51fd246377e65f390c626d9a7a/runtime/gc_base/modronapi.cpp#L805-L813

DanHeidinga commented 4 years ago

@charliegracie Is this still something you plan to work on?

eclipse-openj9 / openj9