eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

Segmentation Error With jit optlevel hot And Higher #15764

Closed lochnagarr closed 2 years ago

lochnagarr commented 2 years ago

Java -version output

openjdk version "1.8.0_345" IBM Semeru Runtime Open Edition (build 1.8.0_345-b01) Eclipse OpenJ9 VM (build openj9-0.33.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20220805_444 (JIT enabled, AOT enabled) OpenJ9 - 04a55b45b OMR - b58aa2708 JCL - f2d89babf5 based on jdk8u345-b01)

openjdk version "11.0.16" 2022-07-19 IBM Semeru Runtime Open Edition 11.0.16.0 (build 11.0.16+8) Eclipse OpenJ9 VM 11.0.16.0 (build openj9-0.33.0, JRE 11 Linux amd64-64-Bit Compressed References 20220804_491 (JIT enabled, AOT enabled) OpenJ9 - 04a55b45b OMR - b58aa2708 JCL - ab74d97849 based on jdk-11.0.16+8)

openjdk version "17.0.4" 2022-07-19 IBM Semeru Runtime Open Edition 17.0.4.0 (build 17.0.4+8) Eclipse OpenJ9 VM 17.0.4.0 (build openj9-0.33.0, JRE 17 Linux amd64-64-Bit Compressed References 20220719_256 (JIT enabled, AOT enabled) OpenJ9 - 04a55b45b OMR - b58aa2708 JCL - d680e266ef4 based on jdk-17.0.4+8)

OS version

Distributor ID: Ubuntu Description: Ubuntu 22.04 LTS Release: 22.04 Codename: jammy

Summary of problem

When running a buggy classfile generated by a fuzzer (with jit enabled), we get the following error message:

#0: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x8c0a95) [0x7f3f71ec0a95]
#1: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x8cbd50) [0x7f3f71ecbd50]
#2: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x13c569) [0x7f3f7173c569]
#3: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9prt29.so(+0x2a8ba) [0x7f3f78c2a8ba]
#4: /lib/x86_64-linux-gnu/libc.so.6(+0x42520) [0x7f3f7a41a520]
#5: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x759656) [0x7f3f71d59656]
#6: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x75dd11) [0x7f3f71d5dd11]
#7: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x75df38) [0x7f3f71d5df38]
#8: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x75e0cd) [0x7f3f71d5e0cd]
#9: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x75e388) [0x7f3f71d5e388]
#10: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x7403aa) [0x7f3f71d403aa]
#11: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x74f7c7) [0x7f3f71d4f7c7]
#12: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x74fd59) [0x7f3f71d4fd59]
#13: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x75110b) [0x7f3f71d5110b]
#14: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x543ca5) [0x7f3f71b43ca5]
#15: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x1502bf) [0x7f3f717502bf]
#16: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x151324) [0x7f3f71751324]
#17: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9prt29.so(+0x2b3f3) [0x7f3f78c2b3f3]
#18: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x14ea49) [0x7f3f7174ea49]
#19: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x14f090) [0x7f3f7174f090]
#20: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x14dbc3) [0x7f3f7174dbc3]
#21: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x14e0a2) [0x7f3f7174e0a2]
#22: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x14e152) [0x7f3f7174e152]
#23: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9prt29.so(+0x2b3f3) [0x7f3f78c2b3f3]
#24: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so(+0x14e582) [0x7f3f7174e582]
#25: /data/jit/jdk/jdk-11.0.16+8/lib/default/libj9thr29.so(+0xe4f6) [0x7f3f7880e4f6]
#26: /lib/x86_64-linux-gnu/libc.so.6(+0x94b43) [0x7f3f7a46cb43]
#27: /lib/x86_64-linux-gnu/libc.so.6(+0x126a00) [0x7f3f7a4fea00]
Unhandled exception
Type=Segmentation error vmState=0x000506ff
J9Generic_Signal_Number=00000018 Signal_Number=0000000b Error_Value=00000000 Signal_Code=00000001
Handler1=00007F3F7903FB90 Handler2=00007F3F78C2A690 InaccessibleAddress=0000000000000010
RDI=00007F3F4CC2BC80 RSI=0000000000000000 RAX=0000000000000000 RBX=00007F3F4CD44740
RCX=00007F3F4D1EEB40 RDX=00007F3F4D1EEA50 R8=00007F3F7208CE20 R9=00007F3F7208CE20
R10=00007F3F4D248800 R11=0000000000000020 R12=00007F3F4D1EEA50 R13=00007F3F795F51B0
R14=00007F3F7208CE20 R15=00007F3F4D2497A0
RIP=00007F3F71D59656 GS=0000 FS=0000 RSP=00007F3F795F3FF0
EFlags=0000000000010246 CS=0033 RBP=00007F3F4D1EEA50 ERR=0000000000000004
TRAPNO=000000000000000E OLDMASK=0000000000000000 CR2=0000000000000010
xmm0 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm1 0f0ea3a27c000000 (f: 2080374784.000000, d: 3.764184e-236)
xmm2 0000000000000800 (f: 2048.000000, d: 1.011846e-320)
xmm3 0000600000000000 (f: 0.000000, d: 5.215017e-310)
xmm4 101f1f0000000000 (f: 0.000000, d: 5.011390e-231)
xmm5 0000000000100000 (f: 1048576.000000, d: 5.180654e-318)
xmm6 0000000000020000 (f: 131072.000000, d: 6.475817e-319)
xmm7 0004000000000000 (f: 0.000000, d: 5.562685e-309)
xmm8 00007f3f4d1242a0 (f: 1293042304.000000, d: 6.912465e-310)
xmm9 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm10 0e23430b020d0043 (f: 34406468.000000, d: 1.444350e-240)
xmm11 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm12 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm14 0000000000000000 (f: 0.000000, d: 0.000000e+00)
xmm15 0000000000000000 (f: 0.000000, d: 0.000000e+00)
Module=/data/jit/jdk/jdk-11.0.16+8/lib/default/libj9jit29.so
Module_base_address=00007F3F71600000

Method_being_compiled=org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V
Target=2_90_20220804_491 (Linux 5.15.0-41-generic)
CPU=amd64 (12 logical CPUs) (0x7c9cac000 RAM)
----------- Stack Backtrace -----------
_ZN14TR_OrderBlocks19peepHoleBranchBlockEPN2TR3CFGEPNS0_5BlockEPc+0x46 (0x00007F3F71D59656 [libj9jit29.so+0x759656])
_ZN14TR_OrderBlocks26doPeepHoleBlockCorrectionsEPN2TR5BlockEPc+0x231 (0x00007F3F71D5DD11 [libj9jit29.so+0x75dd11])
_ZN14TR_OrderBlocks28lookForPeepHoleOpportunitiesEPc+0xe8 (0x00007F3F71D5DF38 [libj9jit29.so+0x75df38])
_ZN14TR_OrderBlocks12doReorderingEv+0x11d (0x00007F3F71D5E0CD [libj9jit29.so+0x75e0cd])
_ZN14TR_OrderBlocks7performEv+0x288 (0x00007F3F71D5E388 [libj9jit29.so+0x75e388])
_ZN20TR_ExtendBasicBlocks7performEv+0x10a (0x00007F3F71D403AA [libj9jit29.so+0x7403aa])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii+0x767 (0x00007F3F71D4F7C7 [libj9jit29.so+0x74f7c7])
_ZN3OMR9Optimizer19performOptimizationEPK20OptimizationStrategyiii+0xcf9 (0x00007F3F71D4FD59 [libj9jit29.so+0x74fd59])
_ZN3OMR9Optimizer8optimizeEv+0x1db (0x00007F3F71D5110B [libj9jit29.so+0x75110b])
_ZN3OMR11Compilation7compileEv+0xaf5 (0x00007F3F71B43CA5 [libj9jit29.so+0x543ca5])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadPNS_11CompilationEP17TR_ResolvedMethodR11TR_J9VMBaseP19TR_OptimizationPlanRKNS_16SegmentAllocatorE+0x4bf (0x00007F3F717502BF [libj9jit29.so+0x1502bf])
_ZN2TR28CompilationInfoPerThreadBase14wrappedCompileEP13J9PortLibraryPv+0x314 (0x00007F3F71751324 [libj9jit29.so+0x151324])
omrsig_protect+0x1e3 (0x00007F3F78C2B3F3 [libj9prt29.so+0x2b3f3])
_ZN2TR28CompilationInfoPerThreadBase7compileEP10J9VMThreadP21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x309 (0x00007F3F7174EA49 [libj9jit29.so+0x14ea49])
_ZN2TR24CompilationInfoPerThread12processEntryER21TR_MethodToBeCompiledRN2J917J9SegmentProviderE+0x1c0 (0x00007F3F7174F090 [libj9jit29.so+0x14f090])
_ZN2TR24CompilationInfoPerThread14processEntriesEv+0x3b3 (0x00007F3F7174DBC3 [libj9jit29.so+0x14dbc3])
_ZN2TR24CompilationInfoPerThread3runEv+0x42 (0x00007F3F7174E0A2 [libj9jit29.so+0x14e0a2])
_Z30protectedCompilationThreadProcP13J9PortLibraryPN2TR24CompilationInfoPerThreadE+0x82 (0x00007F3F7174E152 [libj9jit29.so+0x14e152])
omrsig_protect+0x1e3 (0x00007F3F78C2B3F3 [libj9prt29.so+0x2b3f3])
_Z21compilationThreadProcPv+0x1d2 (0x00007F3F7174E582 [libj9jit29.so+0x14e582])
thread_wrapper+0x186 (0x00007F3F7880E4F6 [libj9thr29.so+0xe4f6])
 (0x00007F3F7A46CB43 [libc.so.6+0x94b43])
 (0x00007F3F7A4FEA00 [libc.so.6+0x126a00])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2022/08/22 14:31:24 - please wait.
JVMDUMP032I JVM requested System dump using '/data/jit/bug_jit/core.20220822.143124.1583075.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport %p %s %c %d %P %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.1583104.

JVMDUMP012E Error in System dump: The core file created by child process with pid = 1583104 was not found. Expected to find core file with name "/data/jit/bug_jit/core.1583104"
JVMDUMP032I JVM requested Java dump using '/data/jit/bug_jit/javacore.20220822.143124.1583075.0002.txt' in response to an event
JVMDUMP010I Java dump written to /data/jit/bug_jit/javacore.20220822.143124.1583075.0002.txt
JVMDUMP032I JVM requested Snap dump using '/data/jit/bug_jit/Snap.20220822.143124.1583075.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /data/jit/bug_jit/Snap.20220822.143124.1583075.0003.trc
JVMDUMP032I JVM requested JIT dump using '/data/jit/bug_jit/jitdump.20220822.143124.1583075.0004.dmp' in response to an event
JVMDUMP051I JIT dump occurred in 'JIT Compilation Thread-000' thread 0x0000000000022100
JVMDUMP049I JIT dump notified all waiting threads of the current method to be compiled
JVMDUMP054I JIT dump is tracing the IL of the method on the crashed compilation thread
JVMDUMP048I JIT dump method being compiled is an ordinary method
JVMDUMP053I JIT dump is recompiling org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V

Note that Openj9 runs with jit enabled. Particularly, setingt optlevel to hot or higher (with jvm option -Xjit:optlevel=hot) will trigger this issue, while warm or lower will not.

Diagnostic files

javacore.20220822.144314.1583749.0002.txt bug_jit.zip

How to Reproduce

  1. Unzip bug_jit.zip and enter bug_jit
  2. run the following command:
    java "-Xjit:count=0,optlevel=hot,limit={org/apache/commons/text/numbers/ParsedDecimal.*}" -jar ./junit-platform-console-standalone-1.8.2.jar -cp ./bug_files/:./classes:./test-classes:commons-rng-client-api-1.4.jar:./commons-rng-core-1.4.jar:./commons-rng-simple-1.4.jar:./junit-platform-console-standalone-1.8.2.jar:./util -m org.apache.commons.text.numbers.ParsedDecimalTest#testMaxPrecision_random

    Note that when optlevel is noOpt, cold and warm, there is no problem. when optlevel is hot, veryHot and scorching, it will produce such error message.

DanHeidinga commented 2 years ago

fyi @0xdaryl

hzongaro commented 2 years ago

Annabelle @a7ehuo, may I ask you to take a look at this crash?

a7ehuo commented 2 years ago

The crashed happened in TR_OrderBlocks::peepHoleBranchBlock because the next treetop of the BBEnd of block_924 is NULL. block_924 is created in generalLoopUnroller. Setting disableGLU also makes the crash go away. Interestingly block_918 is also cloned in generalLoopUnroller but the next treetop for its BBEnd looks good.

The crash seems consistently happen when compiling org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V. However, if I add tracing on this method, the crash goes away. I'll instrument the code and see if I can catch the issue in an early stage of the optimization.

#12 <signal handler called>
#13 0x00007fb2acd0e3f6 in TR_OrderBlocks::peepHoleBranchBlock (this=this@entry=0x7fb28c35b630, 
    cfg=cfg@entry=0x7fb287020000, block=block@entry=0x7fb287712df0, title=title@entry=0x7fb2ad03bb00 "O^O ORDER BLOCKS: ")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OrderBlocks.cpp:1339
#14 0x00007fb2acd12c41 in TR_OrderBlocks::doPeepHoleBlockCorrections (this=this@entry=0x7fb28c35b630, 
    block=block@entry=0x7fb287712df0, title=title@entry=0x7fb2ad03bb00 "O^O ORDER BLOCKS: ")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OrderBlocks.cpp:1550
#15 0x00007fb2acd12e68 in TR_OrderBlocks::lookForPeepHoleOpportunities (this=0x7fb28c35b630, 
    title=0x7fb2ad03bb00 "O^O ORDER BLOCKS: ")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OrderBlocks.cpp:1589
#16 0x00007fb2acd12ffd in TR_OrderBlocks::doReordering (this=this@entry=0x7fb28c35b630)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OrderBlocks.cpp:2078
#17 0x00007fb2acd132b8 in TR_OrderBlocks::perform (this=this@entry=0x7fb28c35b630)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OrderBlocks.cpp:2125
#18 0x00007fb2accf552a in TR_ExtendBasicBlocks::perform (this=<optimized out>)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/LocalOpts.cpp:176
#19 0x00007fb2acd048c7 in OMR::Optimizer::performOptimization (this=this@entry=0x7fb2870f38a0, 
    optimization=optimization@entry=0x7fb2ad03b3d8 <blockManipulationOpts+56>, firstOptIndex=firstOptIndex@entry=0, 
--Type <RET> for more, q to quit, c to continue without paging--
    lastOptIndex=lastOptIndex@entry=2147483647, doTiming=doTiming@entry=0)
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OMROptimizer.cpp:2053
...

(gdb) fr 13
#13 0x00007fb2acd0e3f6 in TR_OrderBlocks::peepHoleBranchBlock (this=this@entry=0x7fb28c35b630, 
    cfg=cfg@entry=0x7fb287020000, block=block@entry=0x7fb287712df0, title=title@entry=0x7fb2ad03bb00 "O^O ORDER BLOCKS: ")
    at /root/home/ahuo/src/openj9-openjdk-jdk11/omr/compiler/optimizer/OrderBlocks.cpp:1339
1339       TR::Block *fallThroughBlock = fallThroughEntry->getNode()->getBlock();
(gdb) print *block
$1 = {<OMR::Block> = {<TR::CFGNode> = {<TR_Link1<TR::CFGNode>> = {_next = 0x7fb287712b30, _valid = true}, 
      _vptr.CFGNode = 0x7fb2ad3bdc98 <vtable for TR::Block+16>, _region = @0x7fb28c361950, _successors = {_head = {
          _next = 0x7fb2877149e0}, _allocator = {<TR::typed_allocator<void, TR::Region&>> = {
            _backingAllocator = @0x7fb28c361950}, <No data fields>}}, _predecessors = {_head = {_next = 0x7fb287713800}, 
        _allocator = {<TR::typed_allocator<void, TR::Region&>> = {_backingAllocator = @0x7fb28c361950}, <No data fields>}}, 
      _exceptionSuccessors = {_head = {_next = 0x0}, _allocator = {<TR::typed_allocator<void, TR::Region&>> = {
            _backingAllocator = @0x7fb28c361950}, <No data fields>}}, _exceptionPredecessors = {_head = {_next = 0x0}, 
        _allocator = {<TR::typed_allocator<void, TR::Region&>> = {_backingAllocator = @0x7fb28c361950}, <No data fields>}}, 
      _nodeNumber = 924, _visitCount = 16732, _frequency = 0, _forwardTraversalIndex = 565, _backwardTraversalIndex = 9}, 
    static _standardExceptions = {{length = 5, name = 0x7fb2acfa214d "Error", exceptions = 4288}, {length = 9, 
        name = 0x7fb2acf8df15 "Exception", exceptions = 445}, {length = 9, name = 0x7fb2acfa7167 "Throwable", 
        exceptions = 8191}, {length = 12, name = 0x7fb2ad0191df "UnknownError", exceptions = 4288}, {length = 13, 
        name = 0x7fb2ad0191ec "InternalError", exceptions = 4288}, {length = 16, name = 0x7fb2ad0191fa "OutOfMemoryError", 
        exceptions = 4288}, {length = 16, name = 0x7fb2ad01920b "RuntimeException", exceptions = 445}, {length = 18, 
        name = 0x7fb2ad055e42 "ClassCastException", exceptions = 32}, {length = 18, 
        name = 0x7fb2ad01921c "IllegalAccessError", exceptions = 4224}, {length = 18, 
        name = 0x7fb2ad01922f "InstantiationError", exceptions = 4160}, {length = 18, 
        name = 0x7fb2ad019242 "StackOverflowError", exceptions = 4288}, {length = 19, 
        name = 0x7fb2ad055c2c "ArithmeticException", exceptions = 4}, {length = 19, 
        name = 0x7fb2ad055c48 "ArrayStoreException", exceptions = 16}, {length = 19, 
        name = 0x7fb2ad019255 "VirtualMachineError", exceptions = 4288}, {length = 20, 
        name = 0x7fb2ad055bf1 "NullPointerException", exceptions = 1}, {length = 25, 
        name = 0x7fb2ad019269 "IndexOutOfBoundsException", exceptions = 8}, {length = 26, 
        name = 0x7fb2ad019283 "NegativeArraySizeException", exceptions = 128}, {length = 28, 
        name = 0x7fb2ad01929e "IllegalMonitorStateException", exceptions = 256}, {length = 28, 
        name = 0x7fb2ad0192bb "IncompatibleClassChangeError", exceptions = 4288}, {length = 30, 
        name = 0x7fb2ad0196e8 "ArrayIndexOutOfBoundsException", exceptions = 8}, {length = 99, name = 0x7fb2acf94b94 "", 
        exceptions = 0}}, _pEntry = 0x7fb287712db0, _pExit = 0x7fb287712dd0, _liveLocals = 0x0, _pStructureOf = 
    0x7fb2877fdba0, _globalRegisters = 0x0, 
    _instructionBoundaries = {<TR_Link<OMR::Block::InstructionBoundaries>> = {<TR_Link0<OMR::Block::InstructionBoundaries>> = {_next = 0x0}, <No data fields>}, _startPC = 4294967295, _endPC = 4294967295}, 
    _snippetBoundaries = {<TR_LinkHead0<OMR::Block::InstructionBoundaries>> = {_head = 0x0}, <No data fields>}, 
    _firstInstruction = 0x0, _lastInstruction = 0x0, _catchBlockExtension = 0x0, _unrollFactor = 0, _blockSize = -1, 
    _blockBCIndex = 0, _j9EstimateSizeMethod = 0x0, _debugCounters = 0x0, _flags = {_flags = 0}, _moreflags = {
      _flags = 0}}, <No data fields>}

(gdb) p block->_pExit
$2 = (TR::TreeTop *) 0x7fb287712dd0

// BBEnd of block_924
(gdb) p block->_pExit->_pNode->_globalIndex
$6 = 7297
(gdb) p block->_pExit->_pNext
$4 = (TR::TreeTop *) 0x0

// BBEnd of block_918
(gdb) p block->_pEntry->_pPrev->_pNode->_globalIndex
$1 = 7247
(gdb) p block->_pEntry->_pPrev->_pNext->_pNode->_globalIndex
$2 = 7293
[  2117] O^O GENERAL LOOP UNROLLER: Unrolling non-counted loop 422 [unrollfactor:4, peelcount:0]
              (Invalidating structure)
         BLOCK CLONER: Newly created block_918 is a clone of original block_418
         ...
         BLOCK CLONER: Newly created block_924 is a clone of original block_418
         ...
         BLOCK CLONER: Newly created block_930 is a clone of original block_418
n7243n    BBStart <block_918> (freq 0) (in loop 422)                                          [0x7fb23b1fc7b0] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0
n7244n    ificmpeq --> block_206 BBStart at n1679n ()                                         [0x7fb23b1fc800] bci=[-1,213,508] rc=0 vc=16944 vn=- li=- udi=- nc=2 flg=0x20
n7245n      iload  <auto slot 12>[#401  Auto] [flags 0x3 0x0 ]                                [0x7fb23b1fc850] bci=[-1,210,508] rc=1 vc=16944 vn=- li=- udi=366 nc=0
n7246n      iconst 1                                                                          [0x7fb23b1fc8a0] bci=[-1,212,508] rc=1 vc=16944 vn=- li=- udi=- nc=0
n7247n    BBEnd </block_918> =====                                                            [0x7fb23b1fc8f0] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0

n7293n    BBStart <block_924> (freq 0) (in loop 422)                                          [0x7fb23b1fd750] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0
n7294n    ificmpeq --> block_206 BBStart at n1679n ()                                         [0x7fb23b1fd7a0] bci=[-1,213,508] rc=0 vc=16944 vn=- li=- udi=- nc=2 flg=0x20
n7295n      iload  <auto slot 12>[#401  Auto] [flags 0x3 0x0 ]                                [0x7fb23b1fd7f0] bci=[-1,210,508] rc=1 vc=16944 vn=- li=- udi=366 nc=0
n7296n      iconst 1                                                                          [0x7fb23b1fd840] bci=[-1,212,508] rc=1 vc=16944 vn=- li=- udi=- nc=0
n7297n    BBEnd </block_924> =====                                                            [0x7fb23b1fd890] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0

n7343n    BBStart <block_930> (freq 0) (in loop 422)                                          [0x7fb23b1fe6f0] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0
n7344n    ificmpeq --> block_206 BBStart at n1679n ()                                         [0x7fb23b1fe740] bci=[-1,213,508] rc=0 vc=16944 vn=- li=- udi=- nc=2 flg=0x20
n7345n      iload  <auto slot 12>[#401  Auto] [flags 0x3 0x0 ]                                [0x7fb23b1fe790] bci=[-1,210,508] rc=1 vc=16944 vn=- li=- udi=366 nc=0
n7346n      iconst 1                                                                          [0x7fb23b1fe7e0] bci=[-1,212,508] rc=1 vc=16944 vn=- li=- udi=- nc=0
n7347n    BBEnd </block_930> =====                                                            [0x7fb23b1fe830] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0
a7ehuo commented 2 years ago

Narrowing down the issue more: the next tree top of the BBEnd n7297n for block_924 is removed in TR_BlockOrderingOptimization::connectTreesAccordingToOrder. Looks like block_924 is moved to the end of tree top list after connectTreesAccordingToOrder. I suspect either TR_OrderBlocks::peepHoleBranchBlock should have checked if fallThroughEntry is NULL or not because there is always a chance the block is the last block, or TR_OrderBlocks::peepHoleBranchBlock should not be called on the block whose exit node doesn't have a successor.

DEBUG START after generateNewOrder: for org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V
...
n7297n    BBEnd </block_924> (DEBUG fallThroughEntry 0x7ff1a3380728 n7343n) =====             [0x7ff1a331d890] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0
...
DEBUG START after connectTreesAccordingToOrder: for org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V
...
n7297n    BBEnd </block_924> (DEBUG fallThroughEntry (nil) n-1n)                              [0x7ff1a331d890] bci=[-1,268,508] rc=0 vc=16944 vn=- li=- udi=- nc=0
...
a7ehuo commented 2 years ago

The root cause of the crash is that the edges of the cloned blocks in general unroller are not set up correctly in addEdgeAndFixEverything. Running verifyCFG confirms the issue. The crash from this test happens only when the EdgeContext is BackEdgeToEntry.

Use block_918 as an example, it's a clone of block_418. After GLU, it should have edges to block_206 and block_422. block_422 is not a fall through block for block_918. A goto block is required after block_918 to jump back to block_422.

jitdump.20220831.094334.4138.0004.oneIteration.CFGVerify.dmp.zip

CFG Before GLU 

422: in=[836 421 418] out=[421 420] // 418 -> 422
421: in=[422]         out=[417 422]
417: in=[421]         out=[420]
420: in=[422 417]     out=[418]
418: in=[420]         out=[206 422] //  418 -> 422

CFG After GLU

422: in=[918 921 836 418] out=[421 420] // 418 -> 422, 918 -> 422
...
418: in=[420]             out=[206 422] //  418 -> 422
...
918: in=[919]             out=[206 422] //  918 -> 422
Tree after GLU 

n3367n    BBStart <block_413> (freq 6) (in loop 10)                                           [0x7fec6fc10c30] bci=[-1,256,508] rc=0 vc=16213 vn=- li=- udi=- nc=0
n3369n    goto --> block_250 BBStart at n2184n                                                [0x7fec6fc10cd0] bci=[-1,256,508] rc=0 vc=16213 vn=- li=- udi=- nc=0
n3368n    BBEnd </block_413> (fallThroughEntry 0x7fec6e815e38 n7243n) =====                   [0x7fec6fc10c80] bci=[-1,256,508] rc=0 vc=16213 vn=- li=- udi=- nc=0

n7243n    BBStart <block_918> (freq 0) (in loop 422)                                          [0x7fec6e2bc7b0] bci=[-1,268,508] rc=0 vc=16213 vn=- li=- udi=- nc=0
n7244n    ificmpeq --> block_206 BBStart at n1679n ()                                         [0x7fec6e2bc800] bci=[-1,213,508] rc=0 vc=16254 vn=- li=- udi=- nc=2 flg=0x20
n7245n      iload  <auto slot 12>[#401  Auto] [flags 0x3 0x0 ]                                [0x7fec6e2bc850] bci=[-1,210,508] rc=1 vc=16254 vn=- li=- udi=366 nc=0
n7246n      iconst 1                                                                          [0x7fec6e2bc8a0] bci=[-1,212,508] rc=1 vc=16254 vn=- li=- udi=- nc=0
n7247n    BBEnd </block_918> (fallThroughEntry 0x7fec6fbddc48 n3397n) =====                   [0x7fec6e2bc8f0] bci=[-1,268,508] rc=0 vc=16213 vn=- li=- udi=- nc=0

n3397n    BBStart <block_418> (freq 0) (in loop 422)                                          [0x7fec6fc11590] bci=[-1,268,508] rc=0 vc=16213 vn=- li=- udi=- nc=0
n3399n    ificmpeq --> block_206 BBStart at n1679n ()                                         [0x7fec6fc11630] bci=[-1,213,508] rc=0 vc=16254 vn=- li=- udi=- nc=2 flg=0x20
n3400n      iload  <auto slot 12>[#401  Auto] [flags 0x3 0x0 ]                                [0x7fec6fc11680] bci=[-1,210,508] rc=1 vc=16254 vn=- li=- udi=366 nc=0
n3401n      iconst 1                                                                          [0x7fec6fc116d0] bci=[-1,212,508] rc=1 vc=16254 vn=- li=- udi=- nc=0
n3398n    BBEnd </block_418> (fallThroughEntry 0x7fec6fbde1c8 n3425n) =====                   [0x7fec6fc115e0] bci=[-1,268,508] rc=0 vc=16213 vn=- li=- udi=- nc=0

n3425n    BBStart <block_422> (freq 7099) (in loop 422) 
...
...
n7248n    BBStart <block_919> (freq 477) (in loop 422)                                        [0x7fec6e2bc940] bci=[-1,268,508] rc=0 vc=16213 vn=- li=- udi=- nc=0
n7249n    asynccheck  jitCheckAsyncMessages[#23  helper Method] [flags 0x400 0x0 ]            [0x7fec6e2bc990] bci=[-1,268,508] rc=0 vc=16254 vn=- li=- udi=- nc=0
n7250n    ificmple --> block_918 BBStart at n7243n (maxLoopIternGuard ) 
...
         BLOCK CLONER: Newly created block_918 is a clone of original block_418
...
unrollLoopOnce: Type C finalUnroll 1 _iteration 1 _unrollKind 5 CompleteUnroll 1 edge: block_418 -> block_422
>>>--------------------------------------
addEdgeAndFixEverything: fromNode (block_418 -> block_422) newFromNode (block_918 -> block_422) redirectOriginal 0 removeOriginalEdges 0 edgeToEntry 1 context 2
addEdgeAndFixEverything: fromNode: (block_418 -> block_422) newFromNode: (block_918 -> block_422) from (block_418 -> block_422) newFrom (block_918 -> block_422). lastNode n3399n isBranch 1 getBranchDestination n1679n to->getEntry() n3425n
addEdgeAndFixEverything: FALLS INTO: newFromNextBlock block_922 != newTo block_422 from (block_418 -> block_422) newFrom (block_918 -> block_422) swingBlocks
addEdgeAndFixEverything: FALLS INTO: createEdge newFromNode (block_918 -> block_422)
addEdgeAndFixEverything: FALLS INTO: addEdge newFrom (block_918 -> block_422)
addEdgeAndFixEverything: fromNode (block_418 -> block_422) newFromNode (block_918 -> block_422) redirectOriginal 0 removeOriginalEdges 0 edgeToEntry 1 context 2
<<<--------------------------------------
...
...
++++++ verifyCFG +++++
Successor block [422] of block [918] containing a branch does not match the destination(s) specified in the IL branch instruction
block_918 (edge block_918 -> block_422) next block_422 fallThroughBlock block_418 branchBlock block_206
Check for correctness of successors is NOT successful
The CFG is NOT correct
Printing out the CFG from CFGChecker
a7ehuo commented 2 years ago

When running a buggy classfile generated by a fuzzer

@lochnagarr Could you elaborate more on what kind of buggy classfile it is referred here? What is the issue with the classfile? I'm trying to understand more of what the test does to assess the fix since the code that's involved hasn't been changed for a long time

Since the crash happened with org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V, I'm especially interested in whether this method might be modified by the fuzzer?

lochnagarr commented 2 years ago

When running a buggy classfile generated by a fuzzer

@lochnagarr Could you elaborate more on what kind of buggy classfile it is referred here? What is the issue with the classfile? I'm trying to understand more of what the test does to assess the fix since the code that's involved hasn't been changed for a long time

Since the crash happened with org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V, I'm especially interested in whether this method might be modified by the fuzzer?

Hi @a7ehuo Yes, the method org/apache/commons/text/numbers/ParsedDecimal.prepareOutput(I)V is modified by the fuzzer. My apology for not being able to provide more information, because this file is generated randomly with complicated transformations.

a7ehuo commented 2 years ago

I found out a bit more on why this issue happens so far only with the fuzzer modified classfile but not with our existing test suites. The existing implementation of TR_LoopUnroller::addEdgeAndFixEverything relies on swingBlocks to swing the block to the right order in the fall through case. In normal circumstances (our existing test suites), swingBlocks works correctly. It doesn't work with the modified classefile because there are more than one block needs to swing to be before block_422 [1]. After processSwingBlocks, only block_418 falls through to block_422. block_918, block_924, and block_930 no longer fall through to block_422 after each swing. The normal circumstances or our existing test suites do not have this issue because there is never more than one block that needs to fall through to the same block.

With this finding, I need to think of the fix again. Adding goto blocks fix the issue but I wonder if it's over doing it for the normal cases. I also need to understand if having multiple blocks fall through to the same block is the desired design or if some other part of the code is not functioning correctly

[1]

processSwingBlocks: (from->to)(block_918 -> block_422): pF block_915 pT block_418 nF block_922 nT block_421
processSwingBlocks: (from->to)(block_924 -> block_422): pF block_923 pT block_918 nF block_928 nT block_421
processSwingBlocks: (from->to)(block_930 -> block_422): pF block_929 pT block_924 nF block_934 nT block_421
processSwingBlocks: (from->to)(block_418 -> block_422): pF block_413 pT block_930 nF block_918 nT block_421
a7ehuo commented 2 years ago

@pshipton @0xdaryl With the above finding, our existing test suites do not have this issue and the issue is not new. More time is needed to assess the fix. I wonder if we should move this issue out of 0.35 release

a7ehuo commented 2 years ago

After some offline discussion with @jdmpapin, I'd like to clarify this issue a bit more. block_918, block_924, and block_930 are a clone of block_418. The loop entry is block_422. block_422 is a fall through of block_418. block_918, block_924, and block_930 all need to get to block_422. "goto" blocks are required so that these blocks can jump to block_422. The uniqueness about this test is that addEdgeAndFixEverything might have not expected the loop entry block block_422 is a fall through of the original block block_418 and multiple cloned blocks need to jump to it. That might be why it didn't handle this case properly. Running verifyCFG clearly shows the edges between these blocks are messed up due to the missing goto blocks. This is not a common case in our existing test suites. I also found addExitEdgeAndFixEverything already handles the similar case by adding goto blocks, which is what's done in the fix #6693 for addEdgeAndFixEverything.

         BLOCK CLONER: Newly created block_918 is a clone of original block_418
...
         BLOCK CLONER: Newly created block_924 is a clone of original block_418
...
         BLOCK CLONER: Newly created block_930 is a clone of original block_418`
Before GLU

n3397n    BBStart <block_418> (freq 0) (in loop 422)                                          [0x7fc904f51590] bci=[-1,268,508] rc=0 vc=15950 vn=- li=- udi=- nc=0
n3399n    ificmpeq --> block_206 BBStart at n1679n ()                                         [0x7fc904f51630] bci=[-1,213,508] rc=0 vc=15950 vn=- li=- udi=- nc=2 flg=0x20
n3400n      iload  <auto slot 12>[#437  Auto] [flags 0x3 0x0 ]                                [0x7fc904f51680] bci=[-1,210,508] rc=1 vc=15950 vn=- li=- udi=366 nc=0
n3401n      iconst 1                                                                          [0x7fc904f516d0] bci=[-1,212,508] rc=1 vc=15950 vn=- li=- udi=- nc=0
n3398n    BBEnd </block_418> =====                                                            [0x7fc904f515e0] bci=[-1,268,508] rc=0 vc=15950 vn=- li=- udi=- nc=0

n3425n    BBStart <block_422> (freq 7099) (in loop 422)