eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

OpenJDK NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null #15230

Closed pshipton closed 2 years ago

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/207 jdk_lang_1 -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-UseCompressedOops java/lang/String/UnicodeCasingTest.java

01:48:57  Testing on ti_ER locale....
01:48:57  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
01:48:57    at java.base/java.lang.Character.digit(Character.java:10661)
01:48:57    at java.base/java.lang.Character.digit(Character.java:10609)
01:48:57    at java.base/java.lang.Integer.parseInt(Integer.java:666)
01:48:57    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
01:48:57    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
01:48:57    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
01:48:57    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
01:48:57    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
01:48:57    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
01:48:57    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
01:48:57    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
01:48:57    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312)
01:48:57    at java.base/java.lang.Thread.run(Thread.java:862)

java/lang/StringBuilder/Comparison.java

01:48:57  STDERR:
01:48:57  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
01:48:57    at java.base/java.lang.Character.digit(Character.java:10661)
01:48:57    at java.base/java.lang.Character.digit(Character.java:10609)
01:48:57    at java.base/java.lang.Integer.parseInt(Integer.java:666)
01:48:57    at java.base/java.lang.Integer.parseInt(Integer.java:786)
01:48:57    at org.testng.reporters.XMLReporterConfig$StackTraceLevels.parse(XMLReporterConfig.java:250)
01:48:57    at org.testng.reporters.XMLReporterConfig.<init>(XMLReporterConfig.java:96)
01:48:57    at org.testng.reporters.XMLReporter.<init>(XMLReporter.java:26)
01:48:57    at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:92)
01:48:57    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
01:48:57    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
01:48:57    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
01:48:57    at java.base/java.lang.reflect.Method.invoke(Method.java:568)
01:48:57    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312)
01:48:57    at java.base/java.lang.Thread.run(Thread.java:862)
pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_OpenJDK19/4 jdk_lang_1 java/lang/String/UnicodeCasingTest.java

15:23:46  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
15:23:46    at java.base/java.lang.Character.digit(Character.java:10935)
15:23:46    at java.base/java.lang.Character.digit(Character.java:10883)
15:23:46    at java.base/java.lang.Integer.parseInt(Integer.java:663)
15:23:46    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
15:23:46    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
15:23:46    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
15:23:46    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)

java/lang/StringBuilder/Comparison.java

15:23:46  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
15:23:46    at java.base/java.lang.Character.digit(Character.java:10935)
15:23:46    at java.base/java.lang.Character.digit(Character.java:10883)
15:23:46    at java.base/java.lang.Integer.parseInt(Integer.java:663)
15:23:46    at java.base/java.lang.Integer.parseInt(Integer.java:781)
15:23:46    at org.testng.reporters.XMLReporterConfig$StackTraceLevels.parse(XMLReporterConfig.java:250)
15:23:46    at org.testng.reporters.XMLReporterConfig.<init>(XMLReporterConfig.java:96)
15:23:46    at org.testng.reporters.XMLReporter.<init>(XMLReporter.java:26)
15:23:46    at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:92)
15:23:46    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
15:23:46    at java.base/java.lang.reflect.Method.invoke(Method.java:578)
15:23:46    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312)
15:23:46    at java.base/java.lang.Thread.run(Thread.java:1573)
15:23:46  
15:23:46  JavaTest Message: Test threw exception: java.lang.NullPointerException
15:23:46  JavaTest Message: shutting down test
knn-k commented 2 years ago

I tried to reproduce the failure locally with JDK 17, but no luck in 50 runs.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/226 jdk_lang_1 java/lang/String/UnicodeCasingTest.java

19:54:35  Testing on lrc_IR_#Arab locale....
19:54:35  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
19:54:35    at java.base/java.lang.Character.digit(Character.java:10661)
19:54:35    at java.base/java.lang.Character.digit(Character.java:10609)
19:54:35    at java.base/java.lang.Integer.parseInt(Integer.java:666)
19:54:35    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
19:54:35    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
19:54:35    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
19:54:35    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
knn-k commented 2 years ago

30x Grinder jobs with UnicodeCasingTest.java:

knn-k commented 2 years ago

Two more Grinder jobs (30x each) with UnicodeCasingTest.java:

knn-k commented 2 years ago

Reproduced in a 30x Grinder job: https://openj9-jenkins.osuosl.org/job/Grinder/1131/ (jdk_lang_1)

This failure seems to happen only in the large heap mode. All the reported failures above are from jdk_lang_1 jobs. I ran the test in the compressed refs mode (jdk_lang_0) as https://openj9-jenkins.osuosl.org/job/Grinder/1134/, and it did not fail.

I tried to get a JIT trace file for java.lang.Character.digit(), but my attempt has been unsuccessful.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Nightly/105 jdk_lang_0 -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops java/lang/String/UnicodeCasingTest.java

21:08:08  Testing on el_CY locale....
21:08:08  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
21:08:08    at java.base/java.lang.Character.digit(Character.java:10658)
21:08:08    at java.base/java.lang.Character.digit(Character.java:10606)
21:08:08    at java.base/java.lang.Integer.parseInt(Integer.java:666)
21:08:08    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
21:08:08    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
21:08:08    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
21:08:08    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
pshipton commented 2 years ago

This latest failure is with -XX:+UseCompressedOops

knn-k commented 2 years ago

I tried running with -Xdump option (dump upon NullPointerException), but the failure was not reproduced. https://openj9-jenkins.osuosl.org/job/Grinder/1140/ https://openj9-jenkins.osuosl.org/job/Grinder/1141/

knn-k commented 2 years ago

Failure reproduced in https://openj9-jenkins.osuosl.org/job/Grinder/1167/:

[2022-07-21T07:08:58.595Z] JVMDUMP039I Processing dump event "systhrow", detail "java/lang/NullPointerException" at 2022/07/21 02:51:58 - please wait.
[2022-07-21T07:08:58.595Z] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/core.20220721.025158.3097098.0001.dmp' in response to an event
[2022-07-21T07:08:58.595Z] JVMDUMP010I System dump written to /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/core.20220721.025158.3097098.0001.dmp
[2022-07-21T07:08:58.595Z] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/javacore.20220721.025158.3097098.0002.txt' in response to an event
[2022-07-21T07:08:58.595Z] JVMDUMP010I Java dump written to /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/javacore.20220721.025158.3097098.0002.txt
[2022-07-21T07:08:58.595Z] JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/jitdump.20220721.025158.3097098.0003.dmp' in response to an event
[2022-07-21T07:08:58.595Z] JVMDUMP051I JIT dump occurred in 'AgentVMThread' thread 0x0000FFFF7C713700
[2022-07-21T07:08:58.595Z] JVMDUMP053I JIT dump is recompiling java/lang/Integer.parseInt(Ljava/lang/String;I)I

Diag files are available at https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder/1167/

knn-k commented 2 years ago

Findings from the Grinder job above:

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Release/19 java/lang/String/UnicodeCasingTest.java

07:11:49  Testing on ga_IE locale....
07:11:49  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
07:11:49    at java.base/java.lang.Character.digit(Character.java:10658)
07:11:49    at java.base/java.lang.Character.digit(Character.java:10606)
07:11:49    at java.base/java.lang.Integer.parseInt(Integer.java:666)
07:11:49    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
07:11:49    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
07:11:49    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
07:11:49    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
knn-k commented 2 years ago

I finally got a JIT trace file of parseInt() in https://openj9-jenkins.osuosl.org/job/Grinder/1179/.

knn-k commented 2 years ago

I got a JIT trace file of Integer.parseInt() from https://openj9-jenkins.osuosl.org/job/Grinder/1179/. This failure seems to occur only when the method is compiled at the scorching level. That should be the reason of its intermittent behavior.

knn-k commented 2 years ago

I got more JIT trace files in https://openj9-jenkins.osuosl.org/job/Grinder/1198/.

knn-k commented 2 years ago

This looks like an issue with the JIT optimization. Who can look at it?

Take a look at the following files in ./aqa-tests/TKG/output_16589343036023/jdk_lang_1/work/java/lang/String/UnicodeCasingTest/ in the artifacts of https://openj9-jenkins.osuosl.org/job/Grinder/1198/.

The JIT trace file contains the output for three compilations, at levels "warm", "hot", and "scorching".

The NPE is thrown at the ldrimmx instruction below in the "scorching" compilation, trying to load VFT with the register w27 being a NULL:

 \\ java/lang/Integer.parseInt(Ljava/lang/String;I)I
 \\  190 JBinvokestatic 56 java/lang/Character.digit(CI)I
 \\       2 JBinvokestatic 88 java/lang/Character.digit(II)I
 \\          6 JBinvokevirtual 89 java/lang/CharacterData.digit(II)I

    0xffff5f9fa548 00000144 [    0xfffedf2782b0] f9400360 6     ldrimmx     w0, [w27, 0]        # SymRef  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ]
knn-k commented 2 years ago

The ldrimmx instruction above is for node n4141n in the following IL tree:

n4143n    ifacmpne --> block_423 BBStart at n4451n (inlineHierarchyGuard )                    [    0xffff24dafeb0] bci=[19,0,197] rc=0 vc=10 vn=- li=192 udi=- nc=3 flg=0x4020
n4141n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ]                           [    0xffff24dafe10] bci=[17,6,10658] rc=4 vc=10 vn=- li=192 udi=- nc=1
n4400n        ==>aRegLoad
n4142n      aconst 0xffff982ff700 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory )  [    0xffff24dafe60] bci=[19,0,197] rc=4 vc=10 vn=- li=192 udi=- nc=0 flg=0x10004

Node n4400n loads the reference from <temp slot 5>[#566 Auto]. The stack slot is initialized as NULL at the method entry of the generated code. The value in the slot is not changed before the NPE is thrown at n4141n.

n4400n        aRegLoad x27   <temp slot 5>[#566  Auto] [flags 0x20000007 0x0 ] (X>=0 SeenRealReference sharedMemory )  [    0xffff250e4f10] bci=[17,6,10658] rc=3 vc=10 vn=- li=19 udi=- nc=0 flg=0x8100

Note that x27 and w27 represent the same register here. It is confusing, but please ignore it now (See Issue https://github.com/eclipse/omr/issues/3137).

The localValuePropagation phase (id=179) changes the following IL node n4030n into n4143n above.

n4030n    ifacmpne --> block_225 BBStart at n2782n (inlineProfiledGuard )                     [    0xffff24dadb60] bci=[19,0,197] rc=0 vc=4070 vn=- li=- udi=- nc=2 flg=0x1020
n4028n      ==>aloadi
n4031n      aconst 0xffff982ff700 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory )  [    0xffff24dadbb0] bci=[19,0,197] rc=1 vc=4070 vn=- li=- udi=- nc=0 flg=0x10004

n4029n, the child node of n4028n, loads the object reference from <temp slot 3>[#561 Auto] instead of <temp slot 5>[#566 Auto]. Is the transformation of the ifacmpne node as expected?

Another question: Why is this failure observed only on AArch64 Linux?

knn-k commented 2 years ago

The transformation of the guard node above seems to be related to the following message in localValuePropagation.

[  4893] E^AReplacing the old guard 0000FFFF24DAFEB0 with the shiny new overridden guard 0000FFFF24F7ED98 at treetop 0000FFFF9DCA26F4

By the way, the addresses in this message is wrong. I opened PR https://github.com/eclipse/omr/pull/6628 to fix it.

pshipton commented 2 years ago

@0xdaryl there is a question in an earlier comment that seems to be directed to you.

This looks like an issue with the JIT optimization. Who can look at it?

knn-k commented 2 years ago

I ran a 50x Grinder job with Java 11 in https://openj9-jenkins.osuosl.org/job/Grinder/1200/, and it reported no failures. It is reproducible only with Java 17 and newer.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/245 jdk_lang_1 java/lang/String/UnicodeCasingTest.java

19:50:51  Testing on bs__#Latn locale....
19:50:51  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
19:50:51    at java.base/java.lang.Character.digit(Character.java:10661)
19:50:51    at java.base/java.lang.Character.digit(Character.java:10609)
19:50:51    at java.base/java.lang.Integer.parseInt(Integer.java:666)
19:50:51    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
19:50:51    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
19:50:51    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
19:50:51    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/246 jdk_lang_0 java/lang/String/UnicodeCasingTest.java

19:44:58  Testing on en_AT locale....
19:44:58  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
19:44:58    at java.base/java.lang.Character.digit(Character.java:10661)
19:44:58    at java.base/java.lang.Character.digit(Character.java:10609)
19:44:58    at java.base/java.lang.Integer.parseInt(Integer.java:666)
19:44:58    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
19:44:58    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
19:44:58    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
19:44:58    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)

https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Nightly/115 test jdk_lang_0 java/lang/String/UnicodeCasingTest.java

21:11:19  Testing on fi_FI locale....
21:11:19  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
21:11:19    at java.base/java.lang.Character.digit(Character.java:10658)
21:11:19    at java.base/java.lang.Character.digit(Character.java:10606)
21:11:19    at java.base/java.lang.Integer.parseInt(Integer.java:666)
21:11:19    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
21:11:19    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
21:11:19    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
21:11:19    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
0xdaryl commented 2 years ago

@hzongaro : could you investigate this from the LVP angle please?

hzongaro commented 2 years ago

@knn-k, I haven't been able to reproduce the failure in "internal" Grinder runs, and I don't have permission to submit Grinder runs on openj9-jenkins.osuosl.org. When you have a few minutes, may I ask you to submit another Grinder run adding the traceLocalVP option — -Xjit:{java/lang/Integer.parseInt(*}(traceFull,traceLocalVP,log=parseInt.txt)?

pshipton commented 2 years ago

Started a grinder Build Status

Again Build Status

knn-k commented 2 years ago

There are failures in the job above. See aqa-tests/TKG/output_16597288137941/jdk_lang_1/work/java/lang/String/UnicodeCasingTest/parseInt.txt.1.1185776.29255.20220805.155415.1185776 in the artifact of https://openj9-jenkins.osuosl.org/job/Grinder_iteration_3/108/ for the trace file.

Before local value propagation (id=180):

n3928n    NULLCHK on n1702n [#32]                                                             [    0xffff77aabb80] bci=[15,1,10661] rc=0 vc=4126 vn=- li=- udi=- nc=1
n3929n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory )      [    0xffff77aabbd0] bci=[15,1,10661] rc=2 vc=4126 vn=- li=- udi=- nc=1 flg=0x4
n1702n        ==>aload
n3931n    ifacmpne --> block_208 BBStart at n2759n (inlineProfiledGuard )                     [    0xffff77aabc70] bci=[17,0,197] rc=0 vc=4126 vn=- li=- udi=- nc=2 flg=0x1020
n3929n      ==>aloadi
n3932n      aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory )  [    0xffff77aabcc0] bci=[17,0,197] rc=1 vc=4126 vn=- li=- udi=- nc=0 flg=0x10004

After local value propagation:

n3928n    NULLCHK on n1702n [#32]                                                             [    0xffff77aabb80] bci=[15,1,10661] rc=0 vc=4180 vn=- li=- udi=- nc=1
n3929n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory )      [    0xffff77aabbd0] bci=[15,1,10661] rc=1 vc=4180 vn=- li=- udi=- nc=1 flg=0x4
n1702n        ==>aload
n4043n    ifacmpne --> block_208 BBStart at n2759n (inlineHierarchyGuard )                    [    0xffff77aadf70] bci=[17,0,197] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x4020
n4041n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ]                           [    0xffff77aaded0] bci=[15,6,10661] rc=1 vc=0 vn=- li=- udi=- nc=1
n4040n        aload  <temp slot 5>[#560  Auto] [flags 0x20000007 0x0 ]                        [    0xffff77aade80] bci=[15,6,10661] rc=1 vc=0 vn=- li=- udi=- nc=0
n4042n      aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant sharedMemory )  [    0xffff77aadf20] bci=[17,0,197] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x10000

n4040n from <temp slot 5> is a NULL here.

I see the following message with the address 0xffff77aadf70 (n4043n) in the trace file.

P2O: Considering a candidate at 0000FFFF77AABC70. Object Constraint : class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject} , Guard constraint : fixed class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject}
P2O: oldGuard 0000FFFF7726D520 newGuard 0000FFFF77AADF70 currentTree 0000FFFF77BB4508
knn-k commented 2 years ago

The IL trees above are from the "scorching" compilation.

The "hot" compilation in the same trace file does not change the inlineProfiledGuard into an inlineHierarchyGuard like the "scorching" compilation does. The inlineProfiledGuard node reuses the VFT loaded by the previous NULLCHK:

n1833n    NULLCHK on n3625n [#32]                                                             [    0xffff77272cb0] bci=[15,1,10661] rc=0 vc=7257 vn=- li=132 udi=- nc=1
n1816n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory )      [    0xffff77272760] bci=[15,1,10661] rc=2 vc=7257 vn=- li=132 udi=- nc=1 flg=0x4
n3625n        ==>aRegLoad
n1818n    ifacmpne --> block_307 BBStart at n3638n (inlineProfiledGuard )                     [    0xffff77272800] bci=[17,0,197] rc=0 vc=7257 vn=- li=132 udi=- nc=3 flg=0x1020
n1816n      ==>aloadi
n1817n      aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory )  [    0xffff772727b0] bci=[17,0,197] rc=1 vc=7257 vn=- li=132 udi=- nc=0 flg=0x10004
knn-k commented 2 years ago

Another "scorching" trace file is available in the artifact of https://openj9-jenkins.osuosl.org/job/Grinder_iteration_1/139/ as aqa-tests/TKG/output_16597331655943/jdk_lang_1/work/java/lang/String/UnicodeCasingTest/parseInt.txt.1.1426511.33813.20220805.211013.1426511 .

hzongaro commented 2 years ago

Thank you for all of your analysis of the failure, Konno-san @knn-k! It looks like constrainIfcmpeqne is replacing the inlineProfiledGuard with an inlineHierarchyGuard.

In the process of constructing the new guard, addDelayedConvertedGuard loads from the symbol that is used as the second operand of the call node - but the definition for that symbol only appears in the call block that's the target of the branch.

n2759n    BBStart <block_208> (freq 1) (cold) (in loop 368)                                   [    0xffff775e4e20] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=- nc=0
n2761n    astore  <temp slot 5>[#560  Auto] [flags 0x20000007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory )  [    0xffff775e4ec0] bci=[15,1,10661] rc=0 vc=4112 vn=- li=- udi=47 nc=1 flg=0x2004
n2762n      aload  <temp slot 3>[#555  Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory )            [    0xffff775e4f10] bci=[15,1,10661] rc=1 vc=4112 vn=- li=- udi=159 nc=0 flg=0x4
n2763n    treetop ()                                                                          [    0xffff775e4f60] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=- nc=1 flg=0x8
n2764n      icalli  java/lang/CharacterData.digit(II)I[#539  virtual Method -184] (Abstract class) [flags 0x500 0x0 ] (virtualCallNodeForAGuardedInlinedCall )  [    0xffff775e4fb0] bci=[15,6,10661] rc=2 vc=4112 vn=- li=- udi=- nc=4 flg=0x828
n2765n        aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory )    [    0xffff775e5000] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=- nc=1 flg=0x4
n2766n          aload  <temp slot 5>[#560  Auto] [flags 0x20000007 0x0 ] (X!=0 X>=0 sharedMemory )  [    0xffff775e5050] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=160 nc=0 flg=0x104
n2767n        aload  <temp slot 5>[#560  Auto] [flags 0x20000007 0x0 ] (X!=0 sharedMemory )   [    0xffff775e50a0] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=161 nc=0 flg=0x4
n2768n        iload  <temp slot 13>[#526  Auto] [flags 0x3 0x0 ] (X>=0 cannotOverflow )       [    0xffff775e50f0] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=162 nc=0 flg=0x1100
n2769n        iload  radix<parm 1 I>[#336  Parm] [flags 0x40000103 0x0 ] (X>=0 cannotOverflow )  [    0xffff775e5140] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=163 nc=0 flg=0x1100
n2770n    istore  <temp slot 4>[#562  Auto] [flags 0x3 0x0 ] ()                               [    0xffff775e5190] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=48 nc=1 flg=0x8
n2764n      ==>icalli
n2760n    BBEnd </block_208> (cold) =====                                                     [    0xffff775e4e70] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=- nc=0
hzongaro commented 2 years ago

The "hot" compilation in the same trace file does not change the inlineProfiledGuard into an inlineHierarchyGuard like the "scorching" compilation does.

Hmmm. It's not clear why the problem only appears at scorching. In the scorching log file, this output appears while processing block_142:

P2O: Considering a candidate at 0000FFFF77AABC70. Object Constraint : class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject} , Guard constraint : fixed class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject}
P2O: oldGuard 0000FFFF7726D520 newGuard 0000FFFF77AADF70 currentTree 0000FFFF77BB4508

P2O stands for "Profiled to Overridden". The second trace message is emitted by addDelayedConvertedGuard when the new guard is set up for a delayed transformation.

In the hot compile, the second trace message doesn't appear when processing the various inlineProfiledGuards. There's a long sequence of conditions considered by constrainIfcmpeqne in deciding whether to change the guard. I'll try to understand what might prevent the transformation at hot.

hzongaro commented 2 years ago

Hmmm. It's not clear why the problem only appears at scorching

Now I see the reason for the difference between hot and scorching. At hot, one of the blocks that contains a profiled guard looks like this:

n1741n    BBStart <block_132> (freq 8601) (in loop 19)                                        [    0xffff77270ff0] bci=[15,1,10661] rc=0 vc=5209 vn=- li=-2 udi=- nc=0
n1833n    NULLCHK on n1745n [#32]                                                             [    0xffff77272cb0] bci=[15,1,10661] rc=0 vc=5209 vn=- li=-2 udi=- nc=1
n1816n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory )      [    0xffff77272760] bci=[15,1,10661] rc=2 vc=5209 vn=- li=-2 udi=- nc=1 flg=0x4
n1745n        aload  <temp slot 3>[#555  Auto] [flags 0x7 0x0 ] (X>=0 sharedMemory )          [    0xffff77271130] bci=[16,135,75] rc=1 vc=5209 vn=- li=-1 udi=78 nc=0 flg=0x100
n1818n    ifacmpne --> block_207 BBStart at n2752n (inlineProfiledGuard )                     [    0xffff77272800] bci=[17,0,197] rc=0 vc=5209 vn=- li=-2 udi=- nc=2 flg=0x1020
n1816n      ==>aloadi
n1817n      aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory )  [    0xffff772727b0] bci=[17,0,197] rc=1 vc=5209 vn=- li=-1 udi=- nc=0 flg=0x10004
n1811n    BBEnd </block_132>                                                                  [    0xffff772725d0] bci=[17,4,197] rc=0 vc=5208 vn=- li=-2 udi=- nc=0

The constraint for the aloadi looks like this:

   n1816n aloadi gets new constraint: value 1816 is  {HeapObject,JavaLangClassObject,J9ClassObject}

At scorching it looks like this

n1699n    BBStart <block_142> (freq 8526) (extension of previous block) (in loop 19)          [    0xffff772702d0] bci=[15,1,10661] rc=0 vc=4126 vn=- li=- udi=- nc=0
n1740n    astore  <temp slot 3>[#555  Auto] [flags 0x7 0x0 ]                                  [    0xffff77270fa0] bci=[16,7,73] rc=0 vc=4126 vn=- li=- udi=16 nc=1
n1702n      aload  java/lang/CharacterDataLatin1.instance Ljava/lang/CharacterDataLatin1;[#546  final Static] [flags 0x20307 0x0 ]  [    0xffff772703c0] bci=[16,7,73] rc=2 vc=4126 vn=- li=- udi=- nc=0
n3928n    NULLCHK on n1702n [#32]                                                             [    0xffff77aabb80] bci=[15,1,10661] rc=0 vc=4126 vn=- li=- udi=- nc=1
n3929n      aloadi  <vft-symbol>[#263  Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory )      [    0xffff77aabbd0] bci=[15,1,10661] rc=2 vc=4126 vn=- li=- udi=- nc=1 flg=0x4
n1702n        ==>aload
n3931n    ifacmpne --> block_208 BBStart at n2759n (inlineProfiledGuard )                     [    0xffff77aabc70] bci=[17,0,197] rc=0 vc=4126 vn=- li=- udi=- nc=2 flg=0x1020
n3929n      ==>aloadi
n3932n      aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory )  [    0xffff77aabcc0] bci=[17,0,197] rc=1 vc=4126 vn=- li=- udi=- nc=0 flg=0x10004
n3933n    BBEnd </block_142>                                                                  

and the contraint for the aloadi is

   n3929n aloadi gets new constraint: value 3929 is class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject}

Notice that the store to #555 appears within block_142 at scorching. My understanding is that Local Value Propagation only considers value constraints within blocks, so at hot, it's unaware that #555 contains an instance of CharacterDataLatin1, preventing the guard from being rewritten.

hzongaro commented 2 years ago

Devin @jdmpapin mentioned off-line that he is working on some changes that affect addDelayedConvertedGuard that will likely fix this bug.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_OpenJDK19/12 jdk_lang_0 java/lang/String/UnicodeCasingTest.java

23:52:59  Testing on ff_GH_#Latn locale....
23:52:59  java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
23:52:59    at java.base/java.lang.Character.digit(Character.java:10935)
23:52:59    at java.base/java.lang.Character.digit(Character.java:10883)
23:52:59    at java.base/java.lang.Integer.parseInt(Integer.java:663)
23:52:59    at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
23:52:59    at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
23:52:59    at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
23:52:59    at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
knn-k commented 2 years ago

@jdmpapin I see some PRs related to guards in OMR that are merged recently. Do you think any one of them affects the optimization in this issue?

jdmpapin commented 2 years ago

Sorry, not quite yet. I have a couple more guard-related PRs to do before the one that fixes this issue

knn-k commented 2 years ago

OK, thanks. I am going to run Grinder jobs when a fix for this issue is available.

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/9

0xdaryl commented 2 years ago

@jdmpapin : will the current direction of your guard changes still affect this code?

Moving out of 0.35 to 0.36 in any case.

jdmpapin commented 2 years ago

Yes, eclipse/omr#6712 and #15923 are the last prerequisites for my planned change to this code. These are needed for reasons described in the former (and those reasons also apply to the current guard upgrade code)

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/286

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/287

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/16/

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/17 java/lang/StringBuilder/Comparison.java java/lang/String/UnicodeCasingTest.java

pshipton commented 2 years ago

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_OpenJDK17/23/ java/lang/String/UnicodeCasingTest.java.UnicodeCasingTest java/lang/ThreadLocal/ReplaceStaleEntry.java.ReplaceStaleEntry java/lang/ThreadLocal/ThreadLocalSupplierTest.java.ThreadLocalSupplierTest jdk/internal/jimage/JImageReadTest.java.JImageReadTest jdk/lambda/ArrayCtorRefTest.java.ArrayCtorRefTest jdk/lambda/MethodReferenceTestCallerSensitive.java.MethodReferenceTestCallerSensitive jdk/lambda/MethodReferenceTestInstanceMethod.java.MethodReferenceTestInstanceMethod jdk/lambda/TestInnerCtorRef.java.TestInnerCtorRef jdk/lambda/TestPrivateCtorRef.java.TestPrivateCtorRef jdk/modules/etc/DefaultModules.java.DefaultModules

knn-k commented 2 years ago

It is interesting that many tests failed with Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null, in addition to known String/UnicodeCasingTest.java and StringBuilder/Comparison.java.

jdmpapin commented 2 years ago

eclipse/omr#6752

knn-k commented 2 years ago

I ran the following Grinder jobs on AArch64 Linux with https://github.com/eclipse/omr/pull/6752, and got no failures.

jdmpapin commented 2 years ago

Fix is merged

pshipton commented 2 years ago

It missed the OMR acceptance builds today, the next one is ~1pm on Monday~ 5am on Friday.

pshipton commented 2 years ago

Oops, thinking it was Friday already, the next acceptance is Fri 5am.

pshipton commented 2 years ago

Hopefully the last build where we'll see this problem before OMR promotes with the fix.

https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/295/