Closed connglli closed 9 months ago
For the rest 11, see issue17052rest11.tar.gz
For convenience, I summarize them in the following table. I hope this can help you, simplifying your work (I know it's somewhat a headache, so I'm sorry). I put each test per directory named X.openj9
, each following the same directory structure as the main issue mentioned above. In the table, the key perhaps_unique
is the unique symptom (if it is) that I wanted to display to you. But as you may noticed, for some of them, I'm not able to reproduce for a second time. The key log.XXXXX
is a summary of the failure I have encountered when trying to reproduce the corresponding test.
In addition, besides the (perhaps) unique symptom, there're chances that different tests can lead to the same crash symptom because of their non-determinism (and some tests can even trigger crashes that I've reported before like #15476). This indicate they are maybe, not unique?
3.openj9
- perhaps_unique: assertion failed ScavengerDelegate.cpp:469: ((false))
- log.1B22R: assertion failed ScavengerDelegate.cpp:469: ((false))
- log.DZo00: assertion failed Scavenger.cpp:1669: ((false))
- log.xhKX6: segmentation error ZN12MM_Scavenger14copyForVariantILb0EEEP8J9ObjectP22MM_EnvironmentStandardP18MM_ForwardedHeader+0x77e
5.openj9
- perhaps_unique: assertion failed ScavengerRootScanner.hpp:105: ((MM_StackSlotValidator(MM_StackSlotValidator::COULD_BE_FORWARDED, *slotPtr, stackLocation, walkState).validate(_env)))
- log.vfCGQ: assertion failed ScavengerRootScanner.hpp:105: ((MM_StackSlotValidator(MM_StackSlotValidator::COULD_BE_FORWARDED, *slotPtr, stackLocation, walkState).validate(_env)))
- log.3u8ai: assertion failed ScavengerRootScanner.hpp:109: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
- log.QDWaj: assertion failed ScavengerRootScanner.hpp:109: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
6.openj9
- perhaps_unique: assertion failed Scavenger.cpp:1634: ((false))
- log.ES0IO: assertion failed Scavenger.cpp:1669: ((false))
- log.DHfO3: segmentation error _ZN18MM_ForwardedHeader17copyOrWaitOutlineEP8J9Object+0x30
- log.eJxJG: assertion failed ScavengerDelegate.cpp:469: ((false))
7.openj9
- perhaps_unique: segmentation error old_fast_jitLookupInterfaceMethod+0x4d
- log.wER9m: segmentation error old_fast_jitLookupInterfaceMethod+0x4d
- log.g4c0e: assertion failed ScavengerDelegate.cpp:469: ((false))
- log.AYVgj: assertion failed Scavenger.cpp:2731: ((false))
8.openj9
- perhaps_unique: assertion failed SweepPoolManagerAddressOrderedListBase.cpp:481: ((false && (objectSizeDelta <= heapFreeByteCount)))
- log.jYkx3: assertion failed SweepPoolManagerAddressOrderedListBase.cpp:481: ((false && (objectSizeDelta <= heapFreeByteCount)))
- log.26AEM: assertion failed SweepPoolManagerAddressOrderedListBase.cpp:481: ((false && (objectSizeDelta <= heapFreeByteCount)))
- log.PFDIo: segme error _ZN22GC_ObjectModelDelegate29calculateObjectDetailsForCopyEP18MM_EnvironmentBaseP18MM_ForwardedHeaderPmS4_S4_+0x34
9.openj9
- perhaps_unique: segmentation error _ZN23MM_TLHAllocationSupport7refreshEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x338
- log.0gxUk: segmentation error _ZN23MM_TLHAllocationSupport7refreshEP18MM_EnvironmentBaseP22MM_AllocateDescriptionb+0x338
- log.4f6XI: segmentation error _ZN22GC_ObjectModelDelegate29calculateObjectDetailsForCopyEP18MM_EnvironmentBaseP18MM_ForwardedHeaderPmS4_S4_+0x34
- log.nBVOa: segmentation error _ZN22GC_ObjectModelDelegate29calculateObjectDetailsForCopyEP18MM_EnvironmentBaseP18MM_ForwardedHeaderPmS4_S4_+0x34
11.openj9
- perhaps_unique: segmentation error findFieldInClass+0x99
- log.R1thu: assertion failed ScavengerRootScanner.hpp:109: ((MM_StackSlotValidator(MM_StackSlotValidator::NOT_ON_HEAP, *slotPtr, stackLocation, walkState).validate(_env)))
- log.COHB9: segmentation error _ZN23MM_ScavengerRootClearer26scavengeUnfinalizedObjectsEP22MM_EnvironmentStandard+0x300
- log.E7Gfs: assertion failed ScavengerRootClearer.cpp:198: ((false && (__null != forwardedPtr)))
13.openj9
- perhaps_unique: segmentation error haltThreadForInspection+0xa3
- log.nUWNA: segmentation error _ZN31MM_MemoryPoolAddressOrderedList11allocateTLHEP18MM_EnvironmentBaseP22MM_AllocateDescriptionmRPvS5_+0x7f
- log.cfpHi segmentation error protectedBacktrace+0x12
- log.QUpX3 segmentation error protectedBacktrace+0x12
14.openj9
- perhaps_unique: assertion failed MemorySubSpace.cpp:185: ((false && (reason <= LOA_EXPAND_LAST_RESIZE_REASON)))
- log.9CGLb: assertion failed MemorySubSpace.cpp:185: ((false && (reason <= LOA_EXPAND_LAST_RESIZE_REASON)))
15.openj9
- perhaps_unique: assertion failed Scavenger.cpp:2731: ((false))
- log.cYvMU: assertion failed Scavenger.cpp:2731: ((false))
- log.TJ8np: assertion failed Scavenger.cpp:2731: ((false))
- log.3FZkI: assertion failed ConcurrentMarkingDelegate.cpp:58: ((MM_StackSlotValidator(0, object, stackLocation, walkState).validate(data->env)))
- log.khhfj: segmentation error romFieldSize+0xc3
- log.tD88r: assertion failed ScavengerDelegate.cpp:469: ((false))
16.openj9
- perhaps_unique: assertion failed MarkingSchemeRootMarker.cpp:48: ((MM_StackSlotValidator(0, object, stackLocation, walkState).validate(_env)))
- log.0xAEW: assertion failed Scavenger.cpp:1669: ((false))
- log.N97lH: segmentation error _ZN22GC_ObjectModelDelegate29calculateObjectDetailsForCopyEP18MM_EnvironmentBaseP18MM_ForwardedHeaderPmS4_S4_+0x34
- log.vqj3w: segmentation error _ZN12MM_Scavenger16rescanThreadSlotEP22MM_EnvironmentStandardPP8J9Object+0xea
- log.yTdHX: segmentation error _ZN18MM_ForwardedHeader17copyOrWaitOutlineEP8J9Object+0x30
@0xdaryl @hzongaro
Most of these assertions in GC might be triggered by heap corruption. There are a very few of them I need to double check, the connection with possible heap corruption is not obvious.
I walked this list of assertions/crashes in GC. There is only one of them can not be explained by heap corruption directly:
MemorySubSpace.cpp:185: ((false && (reason <= LOA_EXPAND_LAST_RESIZE_REASON)))
this assertion means heap expansion is requested but heap contraction has been performed previous GC. It might be side effect of bad allocation request (to large object?) potentially. We observed this error in the past but can not reproduce. The heuristics responsible for expand/contract logic have been reviewed and adjusted but we did not have a chance to verify correctness.If somebody needs triaging of failures in GC listed above please let me know, I will help with this. Of course, system core is necessary.
Oops. That happens to be the test I was unable to reproduce (and my procedure deleted the log periodically)..... Still hope that test could be helpful, to some extend. I'll try again to see if I can reproduce it in more runs.
I have reproduced the failure mentioned by @dmitripivkine. See logs: issue17052rest11.14.tar.gz.
I edited and also updated the issue17052rest11.tar.gz
in my previous comment.
The problem starts at attempt to allocate object with 1836453136 bytes size. With -Xmx1G
specified it is not possible obviously:
14:52:37.237385000 *0x0 j9mm.133 Event Allocation failure start: newspace=8080/2097152 oldspace=5649544/6291456 loa=0/0 requestedbytes=1836453136
@connglli Do you know is allocation of this array of such size is intentional for this test load?
@connglli The rational for question above:
@dmitripivkine The code is generated by our fuzzers. In our fuzzer implementation, we have restricted the size of any array allocation. I have checked the code again, and didn't notice any explicit array allocation of which the size is larger than 524,288(=256x256x8) bytes. But we did generate some library method calls like those of Double, Float, BigDecimal, BigInteger, etc. I suppose they don't have some implicit large array allocations, either (even though I'm not quite sure).
Could you please check the stack trace such that we can get some glue on which array allocation caused this assertion failure? By the way, I'm sorry I'm on my vacation so I might not be that on call.
Could you please check the stack trace such that we can get some glue on which array allocation caused this assertion failure? By the way, I'm sorry I'm on my vacation so I might not be that on call.
3XMTHREADINFO3 Java callstack:
4XESTACKTRACE at java/nio/HeapByteBuffer.<init>(HeapByteBuffer.java:61)
4XESTACKTRACE at java/nio/ByteBuffer.allocate(ByteBuffer.java:348)
4XESTACKTRACE at Test.mainTest(Test.java:600(Compiled Code))
4XESTACKTRACE at Test.main(Test.java:726)
Oh sure yes, it is a library call ax$45.write(ByteBuffer.allocate(i22))
. So for this test an OOM should be thrown instead of an assertion failure. I forgot to check the size of the argument i22
in my fuzzer implementation. But seems that it is a good design to abandon all size checks so that we can find some corner-cases like this :-).
How about the others, I didn't notice ByteBuffer.allocate()
in the test like the main comment.
An assertion in MemorySubSpace.cpp:185
is an obvious problem in GC code (and as a GC Team member I have a priority). The rest of listed assertions are triggered by external to GC reasons (bad O-slot in java stack or heap corruption). I can triage such assertion/crash problems and assign to right component (JIT most likely) but I need system dump to look at. And I don't see any at the moment. Do you have any system dumps left around?
I was able to reproduce this failure with this option:
-Xjit:limit={Test.fMeth*},optLevel=hot,lastOptIndex=84
but not with lastOptIndex=83
. In this case, opt index 84 was Loop Versioner. The transformations listed for that method in Loop Versioner were
[ 3719] 84.1 O^O LOOP VERSIONER: Versioning natural loop 326
[ 3720] 84.2 O^O LOOP VERSIONER: Creating test outside loop for deciding if async check is required
[ 3721] 84.3 O^O LOOP VERSIONER: Creating test outside loop for checking if n4045n [00007F7EA875E010] exceeds bounds
This bears some resemblance to the situation described in issue #17249, so it could be related. I'll assign it to myself for now, in case this is a duplicate.
I haven't been able to make progress on this one for 0.41. I will move it to the 0.43 release.
I think this is likely a duplicate of #17249, so I'll assign it to @a7ehuo.
I can still reproduce the GC assert with the fix for #17249. This issue is also reproducible with disableAsyncCheckVersioning
.
-Xjit:limit={Test.fMeth*},{*}(disableAsyncCheckVersioning,optLevel=hot,lastOptIndex=84,optDetails,traceFull,traceILGen,traceCG,traceLoopVersioner,log=logs/jitlog)' Test
Bad scan type for object pointer 00000000A3617EA8
14:37:34.934 0x1aa00 j9mm.141 * ** ASSERTION FAILED ** at /root/home/ahuo/src/openj9-openjdk-jdk17/openj9/runtime/gc_glue_java/ScavengerDelegate.cpp:480: ((false))
The GC assert happens with lastOptSubIndex=2
but not lastOptSubIndex=1
. This issue seems related to versioning bound check. Meanwhile neither _conditionalTree
nor _duplicatedContionalTree
is set, which is related to the root cause of #17249.
[ 1974] 84.1 O^O LOOP VERSIONER: Versioning natural loop 14
versionNaturalLoop: DEBUG calling buildBoundCheckComparisonsTree. _conditionalTree 0000000000000000 n-1n _duplicatedContionalTree 0000000000000000 n-1n
[ 1975] 84.2 O^O LOOP VERSIONER: Creating test outside loop for checking if n824n [00007F723047F140] exceeds bounds
Thanks for checking into that, Annabelle @a7ehuo. Brad @BradleyWood, following that investigation I'm now wondering whether this might be a duplicate of issue #17045, which you've been investigating.
Yes, I think its the same problem.
I see the same kind of snippet.
try {
for (int ax$16 = 0; ax$16 <= 3; ax$16++) {
for (int ax$17 = 0; ax$17 <= 3; ax$17++) {
ax$15[ax$16 * ax$17] += 1;
}
}
} catch (Throwable ax$18) {
} finally {
}
I'm guessing the loop limit transformation is off by one. Does changing ax$17 <= 3
to ax$17 < 4
fix it?
Does changing ax$17 <= 3 to ax$17 < 4 fix it?
Yes. The test passes with the change
for (int ax$16 = 0; ax$16 <= 3; ax$16++) {
for (int ax$17 = 0; ax$17 < 4; ax$17++) { // Pass
ax$15[ax$16 * ax$17] += 1;
}
I'll reassign this to Brad @BradleyWood, as it looks like this is likely a duplicate of #17045
Just post what I have observed on the failure from this test. Hope it helps. Brad will continue the investigation.
I'm guessing the loop limit transformation is off by one
I noticed the difference between ax$17 <= 3
and ax$17 < 4
is adjustMaxValue, which is calculated from incrementJ
based on isIndexChildMultiplied
.
ax$17 <= 3
case:
incrementJ
is 1
; isIndexChildMultiplied
is 1
maxValue = maxValue - adjustMaxValue = maxValue - (incrementJ+1) = 4 - (1+1) = 2
ax$17 < 4
case:
incrementJ
is 1
; isIndexChildMultiplied
is 0
maxValue = maxValue - adjustMaxValue = maxValue - incrementJ = 4 - 1 = 3
isIndexChildMultiplied
is different for these two cases from what I observed. It results in the final maxValue
for ax$17
as 2
when ax$17 <= 3
. I haven't got to trace why isIndexChildMultiplied
is different.
isIndexChildMultiplied is different for these two cases from what I observed. It results in the final maxValue for ax$17 as 2 when ax$17 <= 3.
Sorry I realized my previous logging is incorrect. It is the incrementJ
that's different for these two cases.
ax$17 <= 3
case:
incrementJ
is 1
; isIndexChildMultiplied
is 1
maxValue = maxValue - adjustMaxValue = maxValue - (incrementJ+1) = 4 - (1+1) = 2
ax$17 < 4
case:
incrementJ
is 0
; isIndexChildMultiplied
is 1
maxValue = maxValue - adjustMaxValue = maxValue - (incrementJ+1) = 4 - (0+1) = 3
Those values are determined by the type of compare operator. I guess the logic doesn't hold when the indices are product of two induction variables?
@BradleyWood, may I ask you to check whether this problem was fixed by either or both the OMR pull requests eclipse/omr#7237 and eclipse/omr#7238?
@hzongaro Its fixed
Java version
Javac version
Code and summary of the problem
Code and logs: see issue17052.tar.gz. The code
Test.java
is a bit long because I ain't able to reduce it.This seems to be the most reproducible problem out of the 12 as mentioned in our previous chat. I can reproduce exactly the same assertion failure (
Scavenger.cpp:1669
) 3 times out of 4 runs. I put logs of all these 4 runs in the link above, each in a directory namedlog.XXXXX
. Hope the same frequency applies to you too. It makes OpenJ9 to crash in GC and should be a JIT bug since there aren't any crash if-Xint
is enabled.The test exited with code 255 and the assertion log (
log.bOTRV
,log.D1Uma
,log.RtuV0
) looks likeThe last run (
log.d3ZAp
) crashed with a different assertion failure: