eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

JDK 8/11/17 crash when handling divided by zero from a long casted from int #17066

Closed CptGit closed 11 months ago

CptGit commented 1 year ago

System / OS / Java Runtime Information

Java version

$ java -version
openjdk version "17.0.6" 2023-01-17
IBM Semeru Runtime Open Edition 17.0.6.0 (build 17.0.6+10)
Eclipse OpenJ9 VM 17.0.6.0 (build openj9-0.36.0, JRE 17 Linux amd64-64-Bit Compressed References 20230117_397 (JIT enabled, AOT enabled)
OpenJ9   - e68fb241f
OMR      - f491bbf6f
JCL      - 927b34f84c8 based on jdk-17.0.6+10)

Operating system details

$ cat /etc/*release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.5 LTS"
NAME="Ubuntu"
VERSION="20.04.5 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.5 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
$ uname -a
Linux zzq-ThinkPad-T470 5.4.0-144-generic #161-Ubuntu SMP Fri Feb 3 14:49:04 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Description

JVM crashed when running the following program. The bug affects 8u362, 11.0.18.0 and 17.0.6.0.

Steps to reproduce

The following steps shows how to reproduce the bug on JDK 17.0.6.0 in a Ubuntu Linux environment.

Compile

$ javac C.java

Execute

$ java C
JVMCDRT000E Unable to locate JIT stack map - aborting VM
JVMCDRT001E Method: C.m(II)J (000000000017C888)
JVMCDRT002E Failing PC: 00007FCF8FFE7742 (offset 0000000000000022), metaData = 00007FCF8E32CAF8
05:39:06.077 0x150300j9codertvm(j9ji.110    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk17u/jdk17u-linux-x64-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:538: ((0 ))
JVMDUMP039I Processing dump event "traceassert", detail "" at 2023/03/30 00:39:06 - please wait.
JVMDUMP032I JVM requested System dump using '/home/zzq/core.20230330.003906.1848239.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.1848260.

JVMDUMP012E Error in System dump: The core file created by child process with pid = 1848260 was not found. Expected to find core file with name "/home/zzq/core"
JVMDUMP032I JVM requested Java dump using '/home/zzq/javacore.20230330.003906.1848239.0002.txt' in response to an event
JVMDUMP012E Error in Java dump: /home/zzq/javacore.20230330.003906.1848239.0002.txt
JVMDUMP032I JVM requested Snap dump using '/home/zzq/Snap.20230330.003906.1848239.0003.trc' in response to an event
JVMDUMP010I Snap dump written to /home/zzq/Snap.20230330.003906.1848239.0003.trc
JVMDUMP013I Processed dump event "traceassert", detail "".

Source code for an executable test case

// C.java
public class C {

    public static long m(int x, int y) {
        return (long) y / x;
    }

    public static void main(String[] args) {
        int count = 0;
        for (int i = 0; i < 10000; ++i) {
            try {
                m(0, 1);
            } catch (ArithmeticException e) {
                count += 1;
            }
        }
        System.out.println(count); // expect 10000
    }
}

Workaround

Disable JIT.

$ java -Xnojit C
10000

Diagnostic files

See attached. diagnostic-files.zip

pshipton commented 1 year ago

I can reproduce it. @0xdaryl @hzongaro

hzongaro commented 1 year ago

Looks to be a problem in Tree Simplification. A DIVCHK is being removed. Before:

n1n       BBStart <block_2>                                                                   [0x7f28da9d9210] bci=[-1,0,4] rc=0 vc=0 vn=- li=- udi=- nc=0
n8n       DIVCHK [#27]                                                                        [0x7f28da9d9440] bci=[-1,4,4] rc=0 vc=0 vn=- li=- udi=- nc=1
n7n         ldiv                                                                              [0x7f28da9d93f0] bci=[-1,4,4] rc=2 vc=0 vn=- li=- udi=- nc=2
n4n           i2l                                                                             [0x7f28da9d9300] bci=[-1,1,4] rc=1 vc=0 vn=- li=- udi=- nc=1
n3n             iload  <parm 1 I>[#352  Parm] [flags 0x40000103 0x0 ]                         [0x7f28da9d92b0] bci=[-1,0,4] rc=1 vc=0 vn=- li=- udi=- nc=0
n6n           i2l                                                                             [0x7f28da9d93a0] bci=[-1,3,4] rc=1 vc=0 vn=- li=- udi=- nc=1
n5n             iload  <parm 0 I>[#351  Parm] [flags 0x40000103 0x0 ]                         [0x7f28da9d9350] bci=[-1,2,4] rc=1 vc=0 vn=- li=- udi=- nc=0
n9n       lreturn                                                                             [0x7f28da9d9490] bci=[-1,5,4] rc=0 vc=0 vn=- li=- udi=- nc=1
n7n         ==>ldiv
n2n       BBEnd </block_2>                                                                    [0x7f28da9d9260] bci=[-1,5,4] rc=0 vc=0 vn=- li=- udi=- nc=0

After

[     1]  21.1    O^O TREE SIMPLIFICATION: Reduced ldiv [00007F28DA9D93F0] of two i2l children to i2l of idiv

n1n       BBStart <block_2>                                                                   [0x7f28da9d9210] bci=[-1,0,4] rc=0 vc=14 vn=- li=- udi=- nc=0
n8n       treetop                                                                             [0x7f28da9d9440] bci=[-1,4,4] rc=0 vc=14 vn=- li=- udi=- nc=1
n10n        idiv                                                                              [0x7f28da9d94e0] bci=[-1,0,4] rc=2 vc=0 vn=- li=- udi=- nc=2
n3n           iload  <parm 1 I>[#352  Parm] [flags 0x40000103 0x0 ]                           [0x7f28da9d92b0] bci=[-1,0,4] rc=1 vc=14 vn=- li=- udi=- nc=0
n5n           iload  <parm 0 I>[#351  Parm] [flags 0x40000103 0x0 ]                           [0x7f28da9d9350] bci=[-1,2,4] rc=1 vc=14 vn=- li=- udi=- nc=0
n9n       lreturn                                                                             [0x7f28da9d9490] bci=[-1,5,4] rc=0 vc=14 vn=- li=- udi=- nc=1
n7n         i2l                                                                               [0x7f28da9d93f0] bci=[-1,4,4] rc=1 vc=14 vn=- li=1 udi=- nc=1
n10n          ==>idiv
n2n       BBEnd </block_2>                                                                    [0x7f28da9d9260] bci=[-1,5,4] rc=0 vc=14 vn=- li=- udi=- nc=0
gacholio commented 1 year ago

Is divide by zero not handled as a signal on X?

hzongaro commented 1 year ago

Is divide by zero not handled as a signal on X?

Yes, but I believe the JIT messes up the stack map because it thinks that the division won't signal in this case. With the DIVCHK, it produces this:

 Offset info:
    stackmap location: 00007FB7617FF131
    map range: starting at [00007FB7639D505A]
      lowOffset: 00000026
      byteCodeInfo: <_callerIndex=-1, byteCodeIndex=4>, _isSameReceiver=0, _doNotProfile=0
      registerSaveDescription: starting at [617FF137] { 00000000 }
      registers: 00000000       { }
      stack map:        { }

    stackmap location: 00007FB7617FF140
    map range: starting at [00007FB7639D5067]
      lowOffset: 00000033
      byteCodeInfo: <_callerIndex=-1, byteCodeIndex=0>, _isSameReceiver=0, _doNotProfile=0
      registerSaveDescription: starting at [617FF146] { 00000000 }
      registers: 7FFE0000       { 17:st(0) 18:st(1) 19:st(2) 20:st(3) 21:st(4) 22:st(5) 23:st(6) 24:st(7) 25:mm0 26:mm1 27:mm2 28:mm3 29:mm4 30:mm5 }
      stack map:        { }
   ...
0x7fb7639d505a 00000026 [0x7fb762a48c80] 48 f7 f9                           idiv        rcx             # IDIV8AccReg
    ...
0x7fb7639d5067 00000033 [0x7fb762a4b620] e8 34 0c 07 1c                     call        jitStackOverflow

while without the DIVCHK, it produces this:

Offset info:
    stackmap location: 00007FF1D35FF131
    map range: starting at [00007FF1FD9D5064]
      lowOffset: 00000030
      byteCodeInfo: <_callerIndex=-1, byteCodeIndex=0>, _isSameReceiver=0, _doNotProfile=0
      registerSaveDescription: starting at [D35FF137] { 00000000 }
      registers: 7FFE0000       { 17:st(0) 18:st(1) 19:st(2) 20:st(3) 21:st(4) 22:st(5) 23:st(6) 24:st(7) 25:mm0 26:mm1 27:mm2 28:mm3 29:mm4 30:mm5 }
      stack map:        { }

   ...

0x7ff1fd9d5053 0000001f [0x7ff1fca67260] f7 7c 24 18                        idiv        dword ptr [rsp+0x18]
   ...
0x7ff1fd9d5064 00000030 [0x7ff1fca69c00] e8 37 0c e7 1c                     call        jitStackOverflow
hzongaro commented 1 year ago

By the way, Peter @pshipton, I don't think this needs to be included in the 0.38 milestone. It doesn't appear to be a problem that was introduced recently. I can reproduce it with a 0.27.0 build:

$ ./jdk8u302-b08/bin/java -version
openjdk version "1.8.0_302"
IBM Semeru Runtime Open Edition (build 1.8.0_302-b08)
Eclipse OpenJ9 VM (build openj9-0.27.0, JRE 1.8.0 Linux amd64-64-Bit Compressed References 20210728_193 (JIT enabled, AOT enabled)
OpenJ9   - 1851b0074
OMR      - 9db1c870d
JCL      - de702c3174 based on jdk8u302-b08)
$ ./jdk8u302-b08/bin/java -Xjit:count=1,disableAsyncCompilation Issue17066
JVMCDRT000E Unable to locate JIT stack map - aborting VM
JVMCDRT001E Method: Issue17066.m(II)J (0000000000157388)
JVMCDRT002E Failing PC: 00007F1E52E9D91A (offset 000000000000001A), metaData = 00007F1E51060DF8
18:09:04.566 0x10bc00j9codertvm(j9ji.110    *   ** ASSERTION FAILED ** at /home/jenkins/workspace/build-scripts/jobs/jdk8u/jdk8u-linux-x64-openj9/workspace/build/src/openj9/runtime/codert_vm/jswalk.c:538: ((0 ))
...
pshipton commented 1 year ago

It seems overly simple to reproduce. If the fix is simple and low risk perhaps we can put it in 0.38.

0xdaryl commented 1 year ago

While @hzongaro understands the problem, the fix touches a few places in the simplifier dealing with DIVCHK removal. We need to be careful here and make sure we handle each situation correctly.

A PR should be ready for 0.40, so moving to that release.

0xdaryl commented 1 year ago

@hzongaro : should this still be targeted to 0.40? Please advise.

hzongaro commented 1 year ago

@0xdaryl, I have a fix ready, but I'm still working on writing unit tests to accompany it. Given that this is a long-standing problem, it's probably OK to defer it.

hzongaro commented 1 year ago

A revised fix for this problem is open for review. Given the complexity, I think it would be safer to defer it the 0.43 release, rather than rushing it into the 0.41 release.