eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Unexpected java.lang.ArrayIndexOutOfBoundsException when using balanced gc #11886

Open 1a2s3d4f1 opened 3 years ago

1a2s3d4f1 commented 3 years ago

Java -version output

openjdk version "1.8.0_282"
OpenJDK Runtime Environment (build 1.8.0_282-b08)
Eclipse OpenJ9 VM (build openj9-0.24.0, JRE 1.8.0 Windows 7 amd64-64-Bit Compres
sed References 20210120_972 (JIT enabled, AOT enabled)
OpenJ9   - 345e1b09e
OMR      - 741e94ea8
JCL      - ab07c6a8fd based on jdk8u282-b08)

Summary of problem

Canvas is a minecraft render mod. The chunk render was broke and crashed when using balanced gc(-Xgcpolicy:balanced)

The chunk render is normal when I using gencon gc. If set -Xmx2048M or set -Xgc:regionSize=1M ,The chunk can render. Maybe it is arraylets problem.

Diagnostic files

crash report

Area.java#L132 AreaFinder.java#L187

gc log(default region size): verbosegc.20210205.211006.2804.zip gc log(1MB region size): verbosegc.20210205.211552.3260.zip

[14:15:01] [Canvas Render Thread - 0/ERROR]: Unhandled error during rendering. Impact unknown.
java.lang.ArrayIndexOutOfBoundsException: null
    at Not Enough Crashes deobfuscated stack trace.(1.16.5+build.3) ~[?:?]
    at grondag.canvas.terrain.occlusion.geometry.Area.keyToIndex(Area.java:132) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.occlusion.geometry.AreaFinder.find(AreaFinder.java:187) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.occlusion.geometry.BoxFinder.markBoxSlices(BoxFinder.java:217) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.occlusion.geometry.BoxFinder.findBoxes(BoxFinder.java:57) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.occlusion.geometry.OcclusionRegion.computeOcclusion(OcclusionRegion.java:463) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.occlusion.geometry.OcclusionRegion.build(OcclusionRegion.java:507) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.region.BuiltRenderRegion.buildRegionData(BuiltRenderRegion.java:513) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.region.BuiltRenderRegion.rebuildOnWorkerThread(BuiltRenderRegion.java:473) ~[canvas-mc116-1.0.1267.jar:?]
    at grondag.canvas.terrain.region.BuiltRenderRegion$$Lambda$6296/0x0000000000000000.accept(Unknown Source) ~[?:?]
    at grondag.canvas.terrain.util.TerrainExecutor$Worker.run(TerrainExecutor.java:116) [canvas-mc116-1.0.1267.jar:?]
    at java.lang.Thread.run(Thread.java:823) [?:1.8.0_282]

In game screenshot: 2021-11-23_23 03 52

pshipton commented 3 years ago

@dmitripivkine @0xdaryl fyi.

1a2s3d4f1 commented 3 years ago

Another balanced gc problem: Java program:HMCL Set -Xmx1024M -Xgcpolicy:balanced 1.jpg gc log:verbosegc.20210206.185105.2000.zip

Set -Xmx1024M -Xgcpolicy:balanced -Xgc:regionSize=16M 2.jpg gc log:verbosegc.20210206.185035.4012.zip

0xdaryl commented 3 years ago

@BeverlyXu : could you work with @1a2s3d4f1 and @dmitripivkine to provide JIT expertise to triage this? The JIT piggybacks on the AIOB mechanism when dealing with discontiguous arrays so there may be a JIT issue here.

BeverlyXu commented 3 years ago

@1a2s3d4f1 Thank you for reporting this issue.

We need to figure out which method/object is having an array out of bound issue. Could you help take a core dump using something like this on the top method from the backtrace: -Xdump:system+java:events=throw,filter=java/lang/ArrayIndexOutOfBoundsException#grondag/canvas/terrain/occlusion/geometry/Area.keyToIndex

If you have more than one exception, you could add this to only generate one core ,range=1..1.

More on the format to generate dump on exceptions, please see here: https://www.eclipse.org/openj9/tools/xdump_option_builder.html

1a2s3d4f1 commented 3 years ago

@BeverlyXu Core dump file: here

dmitripivkine commented 3 years ago

I am not sure how GC might be involved here. Looks like there are two threads with similar stacks and exceptions. From Snap traces:

03:42:40.148188314 *0x40A3E00 j9vm.294             Entry      >setCurrentException index=22 constructorIndex=0 detailMessage=0x0
03:42:40.148196604  0x40A3E00 j9vm.10              Entry      >internalSendExceptionConstructor
03:42:40.148216344  0x40A3E00 j9vm.11              Exit       <internalSendExceptionConstructor
03:42:40.148230161  0x40A3E00 j9vm.301             Exit       <setCurrentException

03:42:40.153794709 *0x36AEB00 j9vm.294             Entry      >setCurrentException index=22 constructorIndex=0 detailMessage=0x0
03:42:40.153802605  0x36AEB00 j9vm.10              Entry      >internalSendExceptionConstructor
03:42:40.153819976  0x36AEB00 j9vm.11              Exit       <internalSendExceptionConstructor
03:42:40.153831819  0x36AEB00 j9vm.301             Exit       <setCurrentException

Both threads have similar java stacks, so there is top of one of them:

> !stackslots 0x040a3e00
<40a3e00> *** BEGIN STACK WALK, flags = 00400001 walkThread = 0x00000000040A3E00 ***
<40a3e00>   ITERATE_O_SLOTS
<40a3e00>   RECORD_BYTECODE_PC_OFFSET
<40a3e00> Initial values: walkSP = 0x00000000047C77F8, PC = 0x0000000000000005, literals = 0x0000000000000000, A0 = 0x00000000047C7818, j2iFrame = 0x00000000047C7958, ELS = 0x000000002FA4F760, decomp = 0x0000000000000000
<40a3e00> JIT resolve frame: bp = 0x00000000047C7818, sp = 0x00000000047C77F8, pc = 0x0000000000000005, cp = 0x0000000000000000, arg0EA = 0x00000000047C7818, flags = 0x0000000000080000
<40a3e00>   Object push (savedJITException)
<40a3e00>       O-Slot[0x00000000047C77F8] = 0x0000000000000000
<40a3e00>   Generic resolve
<40a3e00>   unwindSP initialized to 0x00000000047C7820
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F530] = U64(0x00000000FFFFFFF0) (jit_rax)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F538] = U64(0x0000000000000002) (jit_rbx)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F540] = U64(0x00000000043E0E90) (jit_rcx)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F548] = U64(0x000000002FA4F470) (jit_rdx)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F550] = U64(0x0000000000000000) (jit_rdi)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F558] = U64(0x00000000FFFFFFF0) (jit_rsi)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F560] = U64(0x00000000029EF600) (jit_rbp)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F568] = U64(0x0000000000004010) (jit_rsp)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F570] = U64(0x0000000000000001) (jit_r8)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F578] = U64(0x0000000000000005) (jit_r9)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F580] = U64(0x0000000000000000) (jit_r10)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F588] = U64(0x0000000000000001) (jit_r11)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F590] = U64(0x0000000000000004) (jit_r12)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F598] = U64(0x0000000000000008) (jit_r13)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F5A0] = U64(0x0000000000000006) (jit_r14)
<40a3e00>   JIT-Resolve-RegisterMap[0x000000002FA4F5A8] = U64(0x0000000000000000) (jit_r15)
<40a3e00> JIT frame: bp = 0x00000000047C7828, pc = 0x000007FECD549031, unwindSP = 0x00000000047C7820, cp = 0x00000000043E0ED0, arg0EA = 0x00000000047C7830, jitInfo = 0x000000004AFF53E8
<40a3e00>   Method: grondag/canvas/terrain/occlusion/geometry/Area.keyToIndex(I)I !j9method 0x00000000043E12B8
<40a3e00>   Bytecode index = 4, inlineDepth = 0, PC offset = 0x0000000000000019
<40a3e00>   stackMap=0x000000004AFF54E1, slots=I16(0x0001) parmBaseOffset=I16(0x0010), parmSlots=U16(0x0000), localBaseOffset=I16(0x0000)
<40a3e00>   Described JIT temps starting at 0x00000000047C7828 for IDATA(0x0000000000000000) slots
<40a3e00>   JIT-RegisterMap = UDATA(0x0000000000000000)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F538] = U64(0x0000000000000002) (jit_rbx)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F578] = U64(0x0000000000000005) (jit_r9)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F580] = U64(0x0000000000000000) (jit_r10)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F588] = U64(0x0000000000000001) (jit_r11)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F590] = U64(0x0000000000000004) (jit_r12)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F598] = U64(0x0000000000000008) (jit_r13)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F5A0] = U64(0x0000000000000006) (jit_r14)
<40a3e00>   JIT-Frame-RegisterMap[0x000000002FA4F5A8] = U64(0x0000000000000000) (jit_r15)
<40a3e00> I2J values: PC = 0x0000000036CDB462, A0 = 0x00000000047C7888, walkSP = 0x00000000047C7838, literals = 0x00000000043E0E90, JIT PC = 0x000007FEE0A93C60, pcAddress = 0x000000002FA4F788, decomp = 0x0000000000000000
<40a3e00> Bytecode frame: bp = 0x00000000047C7848, sp = 0x00000000047C7838, pc = 0x0000000036CDB462, cp = 0x00000000043E0BD0, arg0EA = 0x00000000047C7888, flags = 0x0000000000000000
<40a3e00>   Method: grondag/canvas/terrain/occlusion/geometry/AreaFinder.find([JILit/unimi/dsi/fastutil/ints/IntConsumer;)V !j9method 0x00000000043E0E90
<40a3e00>   Bytecode index = 62
<40a3e00>   Using local mapper
<40a3e00>   Locals starting at 0x00000000047C7888 for 0x0000000000000008 slots
<40a3e00>       I-Slot: a0[0x00000000047C7888] = 0x00000000D9CF6858
<40a3e00>       I-Slot: a1[0x00000000047C7880] = 0x00000000D9CF6868
<40a3e00>       I-Slot: a2[0x00000000047C7878] = 0x0000000000000024
<40a3e00>       O-Slot: a3[0x00000000047C7870] = 0x00000000D8AA4220
<40a3e00>       O-Slot: t4[0x00000000047C7868] = 0x00000000D8AA4E20
<40a3e00>       I-Slot: t5[0x00000000047C7860] = 0x000007FE0000001A
<40a3e00>       I-Slot: t6[0x00000000047C7858] = 0x000007FEFFFFFFF0
<40a3e00>       I-Slot: t7[0x00000000047C7850] = 0x000007FE0000479D
<40a3e00> Bytecode frame: bp = 0x00000000047C78A0, sp = 0x00000000047C7890, pc = 0x0000000036CDA332, cp = 0x00000000043DFEB0, arg0EA = 0x00000000047C78D0, flags = 0x0000000000000000
<40a3e00>   Method: grondag/canvas/terrain/occlusion/geometry/BoxFinder.markBoxSlices()V !j9method 0x00000000043E04B8
<40a3e00>   Bytecode index = 54
<40a3e00>   Using local mapper
<40a3e00>   Locals starting at 0x00000000047C78D0 for 0x0000000000000006 slots
<40a3e00>       O-Slot: a0[0x00000000047C78D0] = 0x00000000D9CF3750
<40a3e00>       O-Slot: t1[0x00000000047C78C8] = 0x00000000D9CF6868
<40a3e00>       O-Slot: t2[0x00000000047C78C0] = 0x00000000D9CF6858
<40a3e00>       I-Slot: t3[0x00000000047C78B8] = 0x00000000D8A80000
<40a3e00>       I-Slot: t4[0x00000000047C78B0] = 0x0000000000000024
<40a3e00>       I-Slot: t5[0x00000000047C78A8] = 0x0000000000000009
......

@BeverlyXu @tajila I am not sure what I can check here from GC point of view - everything looks clean. Can somebody investigate what is bad index is, what is array this index for etc. ? There are a few discontiguous arrays in the system but I can not see connection from this stack. I have files downloaded, ask me where. Please let me know if you need help from GC side.

0xdaryl commented 3 years ago

@BeverlyXu : are there any insights you can share from a compiled code perspective on this problem? Perhaps start by finding the object that triggered the AIOB and the site where the AIOB was thrown so we can determine if this is a problem with the spine check at that point.

BeverlyXu commented 3 years ago

I'm working on this issue right now. Will post any findings soon.

0xdaryl commented 3 years ago

Moving out to 0.27.

0xdaryl commented 3 years ago

@jdmpapin : could you pick up the investigation on this for the 0.28 release please?

1a2s3d4f1 commented 2 years ago

https://mega.nz/folder/jAsDBK4Y#V2Oq4-BGF2PEyjrinElSpQ I Uploaded new dump files(jit dump and core dump).

Java version:

openjdk version "11.0.13" 2021-10-19
IBM Semeru Runtime Open Edition 11.0.13.0 (build 11.0.13+8)
Eclipse OpenJ9 VM 11.0.13.0 (build openj9-0.29.0, JRE 11 Windows 7 amd64-64-Bit Compressed References 20211022_218 (JIT enabled, AOT enabled)
OpenJ9   - e1e72c497
OMR      - 299b6a2d2
JCL      - 2d83aa3b76 based on jdk-11.0.13+8)

I prepare a Linux virtual machine. Enable Double Mapping can not fix this AIBO problem(Double Mapping can fix HMCL rendering problem). This problem can happend with metronome gc.