Open JasonFengJ9 opened 5 years ago
Another JTReg Test Failure - serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorStatIntervalTest.java
:
Exception in thread "main" java.lang.RuntimeException: Interval average over 10% for interval 1024 -> 281.0, 102400.0
at MyPackage.HeapMonitorStatIntervalTest.testIntervalOnce(HeapMonitorStatIntervalTest.java:62)
at MyPackage.HeapMonitorStatIntervalTest.testInterval(HeapMonitorStatIntervalTest.java:77)
at MyPackage.HeapMonitorStatIntervalTest.main(HeapMonitorStatIntervalTest.java:87)
This test expects 102400
sampling events but just got 281
when the sampling interval is set to 1024
.
This is probably due to same cause as this issue.
After a bit digging, it appears MM_GCExtensions:oolObjectSamplingBytesGranularity
based sampling only applies to out-of-line allocation, and all those successful inline allocation via ObjectAllocationAPI.hpp:inlineAllocateObject() or inlineAllocateIndexableObject()
are not traced via -Xgc:allocationSamplingGranularity
regardless how small the sampling granularity is set.
JEP331
shares same sampling mechanism hence doesn't sample those inline allocation as this JTreg
test discovered.
@charliegracie could you shine some lights on this?
The test is bogus in my opinion. This is a "sample" and in such is not guaranteed to fire an exact number of samples. I will look into what we could do but this should not work on HotSpot either. The JEP talks about a random sample base such that a common application pattern would not always give the same samples. If that was true then this test could never work! We should be pushing back on the test implementation. Sadly as this is a JEP I doubt we will be able to provide any input.
I will look into what we can do to "improve" the situation but I doubt we will ever have a case where we sample all 400 allocations in the above scenario.
Here is a quick hack that may do what you want. FYI I have not compiled or tested this code.
diff --git a/runtime/gc_base/modronapi.cpp b/runtime/gc_base/modronapi.cpp
index 07220c9a3..11a22772d 100644
--- a/runtime/gc_base/modronapi.cpp
+++ b/runtime/gc_base/modronapi.cpp
@@ -810,6 +810,15 @@ j9gc_set_allocation_sampling_interval(J9VMThread *vmThread, UDATA samplingInterv
samplingInterval = 1;
}
extensions->oolObjectSamplingBytesGranularity = samplingInterval;
+#if defined(J9VM_GC_THREAD_LOCAL_HEAP) || defined(J9VM_GC_SEGREGATED_HEAP)
+ MM_GCExtensions *extensions = MM_GCExtensions::getExtensions(currentThread);
+ J9JavaVM *vm = currentThread->javaVM;
+ J9InternalVMFunctions const * const vmFuncs = vm->internalVMFunctions;
+ IDATA const key = extensions->_TLHAsyncCallbackKey;
+ vmFuncs->J9SignalAsyncEvent(vm, NULL, key);
+ vmFuncs->J9CancelAsyncEvent(vm, currentThread, key);
+ memoryManagerTLHAsyncCallbackHandler(currentThread, key, (void*)vm);
+#endif /* defined(J9VM_GC_THREAD_LOCAL_HEAP) || defined(J9VM_GC_SEGREGATED_HEAP) */
}
/**
diff --git a/runtime/gc_modron_startup/mgcalloc.cpp b/runtime/gc_modron_startup/mgcalloc.cpp
index 93d0092f8..471a36e9e 100644
--- a/runtime/gc_modron_startup/mgcalloc.cpp
+++ b/runtime/gc_modron_startup/mgcalloc.cpp
@@ -626,6 +626,24 @@ memoryManagerTLHAsyncCallbackHandler(J9VMThread *vmThread, IDATA handlerKey, voi
Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventNotHooked(vmThread);
extensions->disableInlineCacheForAllocationThreshold = false;
}
+
+ if (extensions->doOutOfLineAllocationTrace) {
+ //TODO ADD a new tracepoint
+ //Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventIsHooked(vmThread);
+ if (extensions->isStandardGC() || extensions->isVLHGC()) {
+#if defined(J9VM_GC_THREAD_LOCAL_HEAP)
+ extensions->disableInlineCacheForAllocationThreshold = (extensions->oolObjectSamplingBytesGranularity < extensions->tlhMaximumSize);
+#endif /* defined(J9VM_GC_THREAD_LOCAL_HEAP) */
+ } else if (extensions->isSegregatedHeap()) {
+#if defined(J9VM_GC_SEGREGATED_HEAP)
+ extensions->disableInlineCacheForAllocationThreshold = (extensions->oolObjectSamplingBytesGranularity <= J9VMGC_SIZECLASSES_MAX_SMALL_SIZE_BYTES);
+#endif /* defined(J9VM_GC_SEGREGATED_HEAP) */
+ }
+ } else {
+ //TODO ADD a new tracepoint
+ //Trc_MM_memoryManagerTLHAsyncCallbackHandler_eventNotHooked(vmThread);
+ extensions->disableInlineCacheForAllocationThreshold = false;
+ }
if (extensions->isStandardGC() || extensions->isVLHGC()) {
#if defined(J9VM_GC_THREAD_LOCAL_HEAP)
Also the sampling still only happens on out of line allocates. The above hack forces out of line allocates for all allocations when the sampling threshold is less than maximum TLH size.
Thanks @charliegracie for your quick response, will give it a try.
The patch doesn't work as expected because of _oolTraceAllocationBytes
is only updated for out-of-line allocation [1].
It appears _oolTraceAllocationBytes
wasn't updated within
https://github.com/eclipse/openj9/blob/c1b8968fc67ddea5ab2e6deea84fd7f36570f0d1/runtime/gc_include/ObjectAllocationAPI.hpp#L74
and
https://github.com/eclipse/openj9/blob/c1b8968fc67ddea5ab2e6deea84fd7f36570f0d1/runtime/gc_include/ObjectAllocationAPI.hpp#L163
P.S. RI
does pass MyPackage/HeapMonitorVMEventsTest.java
, i.e., it can sample every allocation in this testcase.
just because the RI finds all allocations does not mean it is correct re the wording of the specification. Threads are not atomically updated to see the new values as this is a performance nightmare. The proposed changes above rely on async checks to cause the threads to recognize the new values. No JVM, I repeat no JVM, could be expected to have a lightweight allocation monitoring system force threads through an exclusive vm access to update this value. Did the proposed change improve things at all?
The inline call you posted in the previous comment may not have anything to do with this since the JIT can inline allocate as well. The only acceptable solution is to wait for the next TLH refresh fo the thread to obey the new threshold. If this can not how the JEP is worded then it is flawed and we should be pushing back on the JEP or its tests.
Did the proposed change improve things at all?
There is no observable difference.
Allocation and _oolTraceAllocationBytes
A printf()
within ObjectAllocationAPI.hpp:inlineAllocateIndexableObject()
shows most of allocation (size = 1000
for the integer array size) occurred here. MM_EnvironmentBase::_oolTraceAllocationBytes
is not updated within this inline allocation method. It is only incremented within mgcalloc.cpp:MM_TLHAllocationInterface::allocateObject()
.
-Xjit:count=0
appears to take away those inlineAllocateIndexableObject()
calls.
So inlineAllocateIndexableObject()
(and JIT allocations) don't increase the field _oolTraceAllocationBytes
, is this expected or a known limitation?
The cause of insufficient sampling event I think the cause of insufficient sampling event is not due to the new sampling interval value not recognized right away, it is because those inline allocations are not recorded at first place.
JEP and RI behaviours
This JEP is optional hence not warranted by any compatibility tests. I just use this JTReg
test to match RI
behaviours as much as possible. For this particular testcase, RI
passes consistently, i.e., it can sample each allocation while OpenJ9
hits only around 1/10
.
Hmm looking at your test case you are not changing the value of _oolTraceAllocationBytes via the GC API during runtime you are specifying it via the command line. I thought the JVMTI code allowed this value to be changed dynamically at runtime. Does it? My changes above will only have an impact if you call the j9gc_set_allocation_sampling_interval
after your application threads are up and running. It is not the final correct solution but I was suggesting it as a test. Let me see if I can modify it to handle your current use case as well.
Also you are correct the code currently only accounts for objects allocated ool. It should also be considering TLH refreshes as well. Looking into this now
Hmm looking at your test case you are not changing the value of _oolTraceAllocationBytes via the GC API during runtime you are specifying it via the command line.
The JTReg
testcase does call j9gc_set_allocation_sampling_interval
[1]. Since this JEP implementation shares same event trigger mechanism with GC allocation sampling via -Xgc:allocationSamplingGranularity
, the attached testcase is to demonstrate the sampling pattern without JVMTI
natives.
I did test the patch with original JTReg JVMTI
testcase and got similar results as before.
@JasonFengJ9 is there a JVMTI API for modifying the sampling threshold?
@charliegracie yes, the JVMTI API
mentioned is https://github.com/eclipse/openj9/blob/ac4b60c5af219c51fd246377e65f390c626d9a7a/runtime/include/jvmti.h.m4#L1172
It is implemented at https://github.com/eclipse/openj9/blob/ac4b60c5af219c51fd246377e65f390c626d9a7a/runtime/jvmti/jvmtiMemory.c#L85-L86
which eventually invokes https://github.com/eclipse/openj9/blob/ac4b60c5af219c51fd246377e65f390c626d9a7a/runtime/gc_base/modronapi.cpp#L805-L813
@charliegracie Is this still something you plan to work on?
OpenJ9 does not generate sufficient JVMTI sampling object allocation events when the sampling interval is set to zero
This is a JTreg test failure against JEP 331 implementation.
Failure link
N/A - local runs
Optional info
Failure output
Notes:
The failed test code snippet is in [1]. This test sets the sampling interval to
0
, then attempts to allocate400
integer arraysint[1000]
, and expects400
sampling events which is considered as PASS. Repeat this10
times until a pass, otherwise a failure.OpenJ9
seems only generate sampling events12 - 38
for such400
array allocation, which is quite far from the number of events required.I built a standalone testcase TestSampleEverything.zip without
JVMTI
agent, this issue can be demonstrated via following:The expected number of trace output for
object size=4008; number of elements=1000
is close to10 * 400
, i.e.,4000
, but usually just got below400
.[1] https://github.com/ibmruntimes/openj9-openjdk-jdk11/blob/1c55bba834d441d680298c240ac241ec9aa0b668/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitor.java#L164-L188
fyi: @charliegracie @DanHeidinga