Closed pshipton closed 2 years ago
https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_OpenJDK19/4 jdk_lang_1 java/lang/String/UnicodeCasingTest.java
15:23:46 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
15:23:46 at java.base/java.lang.Character.digit(Character.java:10935)
15:23:46 at java.base/java.lang.Character.digit(Character.java:10883)
15:23:46 at java.base/java.lang.Integer.parseInt(Integer.java:663)
15:23:46 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
15:23:46 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
15:23:46 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
15:23:46 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
java/lang/StringBuilder/Comparison.java
15:23:46 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
15:23:46 at java.base/java.lang.Character.digit(Character.java:10935)
15:23:46 at java.base/java.lang.Character.digit(Character.java:10883)
15:23:46 at java.base/java.lang.Integer.parseInt(Integer.java:663)
15:23:46 at java.base/java.lang.Integer.parseInt(Integer.java:781)
15:23:46 at org.testng.reporters.XMLReporterConfig$StackTraceLevels.parse(XMLReporterConfig.java:250)
15:23:46 at org.testng.reporters.XMLReporterConfig.<init>(XMLReporterConfig.java:96)
15:23:46 at org.testng.reporters.XMLReporter.<init>(XMLReporter.java:26)
15:23:46 at com.sun.javatest.regtest.agent.TestNGRunner.main(TestNGRunner.java:92)
15:23:46 at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
15:23:46 at java.base/java.lang.reflect.Method.invoke(Method.java:578)
15:23:46 at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:312)
15:23:46 at java.base/java.lang.Thread.run(Thread.java:1573)
15:23:46
15:23:46 JavaTest Message: Test threw exception: java.lang.NullPointerException
15:23:46 JavaTest Message: shutting down test
I tried to reproduce the failure locally with JDK 17, but no luck in 50 runs.
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/226 jdk_lang_1 java/lang/String/UnicodeCasingTest.java
19:54:35 Testing on lrc_IR_#Arab locale....
19:54:35 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
19:54:35 at java.base/java.lang.Character.digit(Character.java:10661)
19:54:35 at java.base/java.lang.Character.digit(Character.java:10609)
19:54:35 at java.base/java.lang.Integer.parseInt(Integer.java:666)
19:54:35 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
19:54:35 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
19:54:35 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
19:54:35 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
30x Grinder jobs with UnicodeCasingTest.java:
Two more Grinder jobs (30x each) with UnicodeCasingTest.java:
Reproduced in a 30x Grinder job: https://openj9-jenkins.osuosl.org/job/Grinder/1131/ (jdk_lang_1)
This failure seems to happen only in the large heap mode. All the reported failures above are from jdk_lang_1 jobs. I ran the test in the compressed refs mode (jdk_lang_0) as https://openj9-jenkins.osuosl.org/job/Grinder/1134/, and it did not fail.
I tried to get a JIT trace file for java.lang.Character.digit()
, but my attempt has been unsuccessful.
https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Nightly/105
jdk_lang_0 -Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:+UseCompressedOops
java/lang/String/UnicodeCasingTest.java
21:08:08 Testing on el_CY locale....
21:08:08 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
21:08:08 at java.base/java.lang.Character.digit(Character.java:10658)
21:08:08 at java.base/java.lang.Character.digit(Character.java:10606)
21:08:08 at java.base/java.lang.Integer.parseInt(Integer.java:666)
21:08:08 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
21:08:08 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
21:08:08 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
21:08:08 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
This latest failure is with -XX:+UseCompressedOops
I tried running with -Xdump option (dump upon NullPointerException), but the failure was not reproduced. https://openj9-jenkins.osuosl.org/job/Grinder/1140/ https://openj9-jenkins.osuosl.org/job/Grinder/1141/
Failure reproduced in https://openj9-jenkins.osuosl.org/job/Grinder/1167/:
[2022-07-21T07:08:58.595Z] JVMDUMP039I Processing dump event "systhrow", detail "java/lang/NullPointerException" at 2022/07/21 02:51:58 - please wait.
[2022-07-21T07:08:58.595Z] JVMDUMP032I JVM requested System dump using '/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/core.20220721.025158.3097098.0001.dmp' in response to an event
[2022-07-21T07:08:58.595Z] JVMDUMP010I System dump written to /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/core.20220721.025158.3097098.0001.dmp
[2022-07-21T07:08:58.595Z] JVMDUMP032I JVM requested Java dump using '/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/javacore.20220721.025158.3097098.0002.txt' in response to an event
[2022-07-21T07:08:58.595Z] JVMDUMP010I Java dump written to /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/javacore.20220721.025158.3097098.0002.txt
[2022-07-21T07:08:58.595Z] JVMDUMP032I JVM requested JIT dump using '/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16583858471448/jdk_lang_1/work/scratch/2/jitdump.20220721.025158.3097098.0003.dmp' in response to an event
[2022-07-21T07:08:58.595Z] JVMDUMP051I JIT dump occurred in 'AgentVMThread' thread 0x0000FFFF7C713700
[2022-07-21T07:08:58.595Z] JVMDUMP053I JIT dump is recompiling java/lang/Integer.parseInt(Ljava/lang/String;I)I
Diag files are available at https://openj9-artifactory.osuosl.org/artifactory/ci-openj9/Test/Grinder/1167/
Findings from the Grinder job above:
java/lang/Integer::parseInt(Ljava/lang/String;I)I
.ldr x0, [x27]
at address 0xffff4285dcb4, where x27
is zero at this point.x27
is loaded from a stack slot [x27, #128]
. The stack slot is initialized as zero by str xzr, [x20, #128]
at the entry of the parseInt()
code.parseInt()
, but it has been unsuccessful so far.https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Release/19 java/lang/String/UnicodeCasingTest.java
07:11:49 Testing on ga_IE locale....
07:11:49 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
07:11:49 at java.base/java.lang.Character.digit(Character.java:10658)
07:11:49 at java.base/java.lang.Character.digit(Character.java:10606)
07:11:49 at java.base/java.lang.Integer.parseInt(Integer.java:666)
07:11:49 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
07:11:49 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
07:11:49 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
07:11:49 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
I finally got a JIT trace file of parseInt()
in https://openj9-jenkins.osuosl.org/job/Grinder/1179/.
I got a JIT trace file of Integer.parseInt()
from https://openj9-jenkins.osuosl.org/job/Grinder/1179/.
This failure seems to occur only when the method is compiled at the scorching level. That should be the reason of its intermittent behavior.
I got more JIT trace files in https://openj9-jenkins.osuosl.org/job/Grinder/1198/.
This looks like an issue with the JIT optimization. Who can look at it?
Take a look at the following files in ./aqa-tests/TKG/output_16589343036023/jdk_lang_1/work/java/lang/String/UnicodeCasingTest/ in the artifacts of https://openj9-jenkins.osuosl.org/job/Grinder/1198/.
The JIT trace file contains the output for three compilations, at levels "warm", "hot", and "scorching".
The NPE is thrown at the ldrimmx
instruction below in the "scorching" compilation, trying to load VFT with the register w27
being a NULL:
\\ java/lang/Integer.parseInt(Ljava/lang/String;I)I
\\ 190 JBinvokestatic 56 java/lang/Character.digit(CI)I
\\ 2 JBinvokestatic 88 java/lang/Character.digit(II)I
\\ 6 JBinvokevirtual 89 java/lang/CharacterData.digit(II)I
0xffff5f9fa548 00000144 [ 0xfffedf2782b0] f9400360 6 ldrimmx w0, [w27, 0] # SymRef <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ]
The ldrimmx
instruction above is for node n4141n
in the following IL tree:
n4143n ifacmpne --> block_423 BBStart at n4451n (inlineHierarchyGuard ) [ 0xffff24dafeb0] bci=[19,0,197] rc=0 vc=10 vn=- li=192 udi=- nc=3 flg=0x4020
n4141n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] [ 0xffff24dafe10] bci=[17,6,10658] rc=4 vc=10 vn=- li=192 udi=- nc=1
n4400n ==>aRegLoad
n4142n aconst 0xffff982ff700 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory ) [ 0xffff24dafe60] bci=[19,0,197] rc=4 vc=10 vn=- li=192 udi=- nc=0 flg=0x10004
Node n4400n
loads the reference from <temp slot 5>[#566 Auto]
. The stack slot is initialized as NULL at the method entry of the generated code. The value in the slot is not changed before the NPE is thrown at n4141n
.
n4400n aRegLoad x27 <temp slot 5>[#566 Auto] [flags 0x20000007 0x0 ] (X>=0 SeenRealReference sharedMemory ) [ 0xffff250e4f10] bci=[17,6,10658] rc=3 vc=10 vn=- li=19 udi=- nc=0 flg=0x8100
Note that x27
and w27
represent the same register here. It is confusing, but please ignore it now (See Issue https://github.com/eclipse/omr/issues/3137).
The localValuePropagation phase (id=179) changes the following IL node n4030n
into n4143n
above.
n4030n ifacmpne --> block_225 BBStart at n2782n (inlineProfiledGuard ) [ 0xffff24dadb60] bci=[19,0,197] rc=0 vc=4070 vn=- li=- udi=- nc=2 flg=0x1020
n4028n ==>aloadi
n4031n aconst 0xffff982ff700 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory ) [ 0xffff24dadbb0] bci=[19,0,197] rc=1 vc=4070 vn=- li=- udi=- nc=0 flg=0x10004
n4029n
, the child node of n4028n
, loads the object reference from <temp slot 3>[#561 Auto]
instead of <temp slot 5>[#566 Auto]
.
Is the transformation of the ifacmpne
node as expected?
Another question: Why is this failure observed only on AArch64 Linux?
The transformation of the guard node above seems to be related to the following message in localValuePropagation.
[ 4893] E^AReplacing the old guard 0000FFFF24DAFEB0 with the shiny new overridden guard 0000FFFF24F7ED98 at treetop 0000FFFF9DCA26F4
By the way, the addresses in this message is wrong. I opened PR https://github.com/eclipse/omr/pull/6628 to fix it.
@0xdaryl there is a question in an earlier comment that seems to be directed to you.
This looks like an issue with the JIT optimization. Who can look at it?
I ran a 50x Grinder job with Java 11 in https://openj9-jenkins.osuosl.org/job/Grinder/1200/, and it reported no failures. It is reproducible only with Java 17 and newer.
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/245 jdk_lang_1 java/lang/String/UnicodeCasingTest.java
19:50:51 Testing on bs__#Latn locale....
19:50:51 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
19:50:51 at java.base/java.lang.Character.digit(Character.java:10661)
19:50:51 at java.base/java.lang.Character.digit(Character.java:10609)
19:50:51 at java.base/java.lang.Integer.parseInt(Integer.java:666)
19:50:51 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
19:50:51 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
19:50:51 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
19:50:51 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/246 jdk_lang_0 java/lang/String/UnicodeCasingTest.java
19:44:58 Testing on en_AT locale....
19:44:58 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
19:44:58 at java.base/java.lang.Character.digit(Character.java:10661)
19:44:58 at java.base/java.lang.Character.digit(Character.java:10609)
19:44:58 at java.base/java.lang.Integer.parseInt(Integer.java:666)
19:44:58 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
19:44:58 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
19:44:58 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
19:44:58 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
https://openj9-jenkins.osuosl.org/job/Test_openjdk18_j9_sanity.openjdk_aarch64_linux_Nightly/115 test jdk_lang_0 java/lang/String/UnicodeCasingTest.java
21:11:19 Testing on fi_FI locale....
21:11:19 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
21:11:19 at java.base/java.lang.Character.digit(Character.java:10658)
21:11:19 at java.base/java.lang.Character.digit(Character.java:10606)
21:11:19 at java.base/java.lang.Integer.parseInt(Integer.java:666)
21:11:19 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
21:11:19 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
21:11:19 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
21:11:19 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
@hzongaro : could you investigate this from the LVP angle please?
@knn-k, I haven't been able to reproduce the failure in "internal" Grinder runs, and I don't have permission to submit Grinder runs on openj9-jenkins.osuosl.org. When you have a few minutes, may I ask you to submit another Grinder run adding the traceLocalVP
option — -Xjit:{java/lang/Integer.parseInt(*}(traceFull,traceLocalVP,log=parseInt.txt)
?
There are failures in the job above. See aqa-tests/TKG/output_16597288137941/jdk_lang_1/work/java/lang/String/UnicodeCasingTest/parseInt.txt.1.1185776.29255.20220805.155415.1185776 in the artifact of https://openj9-jenkins.osuosl.org/job/Grinder_iteration_3/108/ for the trace file.
Before local value propagation (id=180):
n3928n NULLCHK on n1702n [#32] [ 0xffff77aabb80] bci=[15,1,10661] rc=0 vc=4126 vn=- li=- udi=- nc=1
n3929n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0xffff77aabbd0] bci=[15,1,10661] rc=2 vc=4126 vn=- li=- udi=- nc=1 flg=0x4
n1702n ==>aload
n3931n ifacmpne --> block_208 BBStart at n2759n (inlineProfiledGuard ) [ 0xffff77aabc70] bci=[17,0,197] rc=0 vc=4126 vn=- li=- udi=- nc=2 flg=0x1020
n3929n ==>aloadi
n3932n aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory ) [ 0xffff77aabcc0] bci=[17,0,197] rc=1 vc=4126 vn=- li=- udi=- nc=0 flg=0x10004
After local value propagation:
n3928n NULLCHK on n1702n [#32] [ 0xffff77aabb80] bci=[15,1,10661] rc=0 vc=4180 vn=- li=- udi=- nc=1
n3929n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0xffff77aabbd0] bci=[15,1,10661] rc=1 vc=4180 vn=- li=- udi=- nc=1 flg=0x4
n1702n ==>aload
n4043n ifacmpne --> block_208 BBStart at n2759n (inlineHierarchyGuard ) [ 0xffff77aadf70] bci=[17,0,197] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x4020
n4041n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] [ 0xffff77aaded0] bci=[15,6,10661] rc=1 vc=0 vn=- li=- udi=- nc=1
n4040n aload <temp slot 5>[#560 Auto] [flags 0x20000007 0x0 ] [ 0xffff77aade80] bci=[15,6,10661] rc=1 vc=0 vn=- li=- udi=- nc=0
n4042n aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant sharedMemory ) [ 0xffff77aadf20] bci=[17,0,197] rc=1 vc=0 vn=- li=- udi=- nc=0 flg=0x10000
n4040n
from <temp slot 5>
is a NULL here.
I see the following message with the address 0xffff77aadf70 (n4043n
) in the trace file.
P2O: Considering a candidate at 0000FFFF77AABC70. Object Constraint : class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject} , Guard constraint : fixed class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject}
P2O: oldGuard 0000FFFF7726D520 newGuard 0000FFFF77AADF70 currentTree 0000FFFF77BB4508
The IL trees above are from the "scorching" compilation.
The "hot" compilation in the same trace file does not change the inlineProfiledGuard into an inlineHierarchyGuard like the "scorching" compilation does. The inlineProfiledGuard node reuses the VFT loaded by the previous NULLCHK:
n1833n NULLCHK on n3625n [#32] [ 0xffff77272cb0] bci=[15,1,10661] rc=0 vc=7257 vn=- li=132 udi=- nc=1
n1816n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0xffff77272760] bci=[15,1,10661] rc=2 vc=7257 vn=- li=132 udi=- nc=1 flg=0x4
n3625n ==>aRegLoad
n1818n ifacmpne --> block_307 BBStart at n3638n (inlineProfiledGuard ) [ 0xffff77272800] bci=[17,0,197] rc=0 vc=7257 vn=- li=132 udi=- nc=3 flg=0x1020
n1816n ==>aloadi
n1817n aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory ) [ 0xffff772727b0] bci=[17,0,197] rc=1 vc=7257 vn=- li=132 udi=- nc=0 flg=0x10004
Another "scorching" trace file is available in the artifact of https://openj9-jenkins.osuosl.org/job/Grinder_iteration_1/139/ as aqa-tests/TKG/output_16597331655943/jdk_lang_1/work/java/lang/String/UnicodeCasingTest/parseInt.txt.1.1426511.33813.20220805.211013.1426511 .
Thank you for all of your analysis of the failure, Konno-san @knn-k! It looks like constrainIfcmpeqne
is replacing the inlineProfiledGuard with an inlineHierarchyGuard.
In the process of constructing the new guard, addDelayedConvertedGuard
loads from the symbol that is used as the second operand of the call node - but the definition for that symbol only appears in the call block that's the target of the branch.
n2759n BBStart <block_208> (freq 1) (cold) (in loop 368) [ 0xffff775e4e20] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=- nc=0
n2761n astore <temp slot 5>[#560 Auto] [flags 0x20000007 0x0 ] (X!=0 privatizedInlinerArg sharedMemory ) [ 0xffff775e4ec0] bci=[15,1,10661] rc=0 vc=4112 vn=- li=- udi=47 nc=1 flg=0x2004
n2762n aload <temp slot 3>[#555 Auto] [flags 0x7 0x0 ] (X!=0 sharedMemory ) [ 0xffff775e4f10] bci=[15,1,10661] rc=1 vc=4112 vn=- li=- udi=159 nc=0 flg=0x4
n2763n treetop () [ 0xffff775e4f60] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=- nc=1 flg=0x8
n2764n icalli java/lang/CharacterData.digit(II)I[#539 virtual Method -184] (Abstract class) [flags 0x500 0x0 ] (virtualCallNodeForAGuardedInlinedCall ) [ 0xffff775e4fb0] bci=[15,6,10661] rc=2 vc=4112 vn=- li=- udi=- nc=4 flg=0x828
n2765n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0xffff775e5000] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=- nc=1 flg=0x4
n2766n aload <temp slot 5>[#560 Auto] [flags 0x20000007 0x0 ] (X!=0 X>=0 sharedMemory ) [ 0xffff775e5050] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=160 nc=0 flg=0x104
n2767n aload <temp slot 5>[#560 Auto] [flags 0x20000007 0x0 ] (X!=0 sharedMemory ) [ 0xffff775e50a0] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=161 nc=0 flg=0x4
n2768n iload <temp slot 13>[#526 Auto] [flags 0x3 0x0 ] (X>=0 cannotOverflow ) [ 0xffff775e50f0] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=162 nc=0 flg=0x1100
n2769n iload radix<parm 1 I>[#336 Parm] [flags 0x40000103 0x0 ] (X>=0 cannotOverflow ) [ 0xffff775e5140] bci=[15,6,10661] rc=1 vc=4112 vn=- li=- udi=163 nc=0 flg=0x1100
n2770n istore <temp slot 4>[#562 Auto] [flags 0x3 0x0 ] () [ 0xffff775e5190] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=48 nc=1 flg=0x8
n2764n ==>icalli
n2760n BBEnd </block_208> (cold) ===== [ 0xffff775e4e70] bci=[15,6,10661] rc=0 vc=4112 vn=- li=- udi=- nc=0
The "hot" compilation in the same trace file does not change the inlineProfiledGuard into an inlineHierarchyGuard like the "scorching" compilation does.
Hmmm. It's not clear why the problem only appears at scorching. In the scorching log file, this output appears while processing block_142:
P2O: Considering a candidate at 0000FFFF77AABC70. Object Constraint : class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject} , Guard constraint : fixed class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject}
P2O: oldGuard 0000FFFF7726D520 newGuard 0000FFFF77AADF70 currentTree 0000FFFF77BB4508
P2O stands for "Profiled to Overridden". The second trace message is emitted by addDelayedConvertedGuard
when the new guard is set up for a delayed transformation.
In the hot compile, the second trace message doesn't appear when processing the various inlineProfiledGuards. There's a long sequence of conditions considered by constrainIfcmpeqne
in deciding whether to change the guard. I'll try to understand what might prevent the transformation at hot.
Hmmm. It's not clear why the problem only appears at scorching
Now I see the reason for the difference between hot and scorching. At hot, one of the blocks that contains a profiled guard looks like this:
n1741n BBStart <block_132> (freq 8601) (in loop 19) [ 0xffff77270ff0] bci=[15,1,10661] rc=0 vc=5209 vn=- li=-2 udi=- nc=0
n1833n NULLCHK on n1745n [#32] [ 0xffff77272cb0] bci=[15,1,10661] rc=0 vc=5209 vn=- li=-2 udi=- nc=1
n1816n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0xffff77272760] bci=[15,1,10661] rc=2 vc=5209 vn=- li=-2 udi=- nc=1 flg=0x4
n1745n aload <temp slot 3>[#555 Auto] [flags 0x7 0x0 ] (X>=0 sharedMemory ) [ 0xffff77271130] bci=[16,135,75] rc=1 vc=5209 vn=- li=-1 udi=78 nc=0 flg=0x100
n1818n ifacmpne --> block_207 BBStart at n2752n (inlineProfiledGuard ) [ 0xffff77272800] bci=[17,0,197] rc=0 vc=5209 vn=- li=-2 udi=- nc=2 flg=0x1020
n1816n ==>aloadi
n1817n aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory ) [ 0xffff772727b0] bci=[17,0,197] rc=1 vc=5209 vn=- li=-1 udi=- nc=0 flg=0x10004
n1811n BBEnd </block_132> [ 0xffff772725d0] bci=[17,4,197] rc=0 vc=5208 vn=- li=-2 udi=- nc=0
The constraint for the aloadi
looks like this:
n1816n aloadi gets new constraint: value 1816 is {HeapObject,JavaLangClassObject,J9ClassObject}
At scorching it looks like this
n1699n BBStart <block_142> (freq 8526) (extension of previous block) (in loop 19) [ 0xffff772702d0] bci=[15,1,10661] rc=0 vc=4126 vn=- li=- udi=- nc=0
n1740n astore <temp slot 3>[#555 Auto] [flags 0x7 0x0 ] [ 0xffff77270fa0] bci=[16,7,73] rc=0 vc=4126 vn=- li=- udi=16 nc=1
n1702n aload java/lang/CharacterDataLatin1.instance Ljava/lang/CharacterDataLatin1;[#546 final Static] [flags 0x20307 0x0 ] [ 0xffff772703c0] bci=[16,7,73] rc=2 vc=4126 vn=- li=- udi=- nc=0
n3928n NULLCHK on n1702n [#32] [ 0xffff77aabb80] bci=[15,1,10661] rc=0 vc=4126 vn=- li=- udi=- nc=1
n3929n aloadi <vft-symbol>[#263 Shadow] [flags 0x18607 0x0 ] (X!=0 sharedMemory ) [ 0xffff77aabbd0] bci=[15,1,10661] rc=2 vc=4126 vn=- li=- udi=- nc=1 flg=0x4
n1702n ==>aload
n3931n ifacmpne --> block_208 BBStart at n2759n (inlineProfiledGuard ) [ 0xffff77aabc70] bci=[17,0,197] rc=0 vc=4126 vn=- li=- udi=- nc=2 flg=0x1020
n3929n ==>aloadi
n3932n aconst 0xffffb8314400 (java/lang/CharacterDataLatin1.class) (classPointerConstant X!=0 sharedMemory ) [ 0xffff77aabcc0] bci=[17,0,197] rc=1 vc=4126 vn=- li=- udi=- nc=0 flg=0x10004
n3933n BBEnd </block_142>
and the contraint for the aloadi
is
n3929n aloadi gets new constraint: value 3929 is class Ljava/lang/CharacterDataLatin1; (non-NULL) {J9ClassObject}
Notice that the store to #555
appears within block_142 at scorching. My understanding is that Local Value Propagation only considers value constraints within blocks, so at hot, it's unaware that #555
contains an instance of CharacterDataLatin1
, preventing the guard from being rewritten.
Devin @jdmpapin mentioned off-line that he is working on some changes that affect addDelayedConvertedGuard that will likely fix this bug.
https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_OpenJDK19/12 jdk_lang_0 java/lang/String/UnicodeCasingTest.java
23:52:59 Testing on ff_GH_#Latn locale....
23:52:59 java.lang.NullPointerException: Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
23:52:59 at java.base/java.lang.Character.digit(Character.java:10935)
23:52:59 at java.base/java.lang.Character.digit(Character.java:10883)
23:52:59 at java.base/java.lang.Integer.parseInt(Integer.java:663)
23:52:59 at UnicodeCasingTest.convert(UnicodeCasingTest.java:210)
23:52:59 at UnicodeCasingTest.test(UnicodeCasingTest.java:150)
23:52:59 at UnicodeCasingTest.test(UnicodeCasingTest.java:96)
23:52:59 at UnicodeCasingTest.main(UnicodeCasingTest.java:64)
@jdmpapin I see some PRs related to guards in OMR that are merged recently. Do you think any one of them affects the optimization in this issue?
Sorry, not quite yet. I have a couple more guard-related PRs to do before the one that fixes this issue
OK, thanks. I am going to run Grinder jobs when a fix for this issue is available.
@jdmpapin : will the current direction of your guard changes still affect this code?
Moving out of 0.35 to 0.36 in any case.
Yes, eclipse/omr#6712 and #15923 are the last prerequisites for my planned change to this code. These are needed for reasons described in the former (and those reasons also apply to the current guard upgrade code)
https://openj9-jenkins.osuosl.org/job/Test_openjdk19_j9_sanity.openjdk_aarch64_linux_Nightly/17 java/lang/StringBuilder/Comparison.java java/lang/String/UnicodeCasingTest.java
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_OpenJDK17/23/ java/lang/String/UnicodeCasingTest.java.UnicodeCasingTest java/lang/ThreadLocal/ReplaceStaleEntry.java.ReplaceStaleEntry java/lang/ThreadLocal/ThreadLocalSupplierTest.java.ThreadLocalSupplierTest jdk/internal/jimage/JImageReadTest.java.JImageReadTest jdk/lambda/ArrayCtorRefTest.java.ArrayCtorRefTest jdk/lambda/MethodReferenceTestCallerSensitive.java.MethodReferenceTestCallerSensitive jdk/lambda/MethodReferenceTestInstanceMethod.java.MethodReferenceTestInstanceMethod jdk/lambda/TestInnerCtorRef.java.TestInnerCtorRef jdk/lambda/TestPrivateCtorRef.java.TestPrivateCtorRef jdk/modules/etc/DefaultModules.java.DefaultModules
It is interesting that many tests failed with Cannot invoke "java.lang.CharacterData.of(int)" because "codePoint" is null
, in addition to known String/UnicodeCasingTest.java
and StringBuilder/Comparison.java
.
eclipse/omr#6752
I ran the following Grinder jobs on AArch64 Linux with https://github.com/eclipse/omr/pull/6752, and got no failures.
Fix is merged
It missed the OMR acceptance builds today, the next one is ~1pm on Monday~ 5am on Friday.
Oops, thinking it was Friday already, the next acceptance is Fri 5am.
Hopefully the last build where we'll see this problem before OMR promotes with the fix.
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/295/
https://openj9-jenkins.osuosl.org/job/Test_openjdk17_j9_sanity.openjdk_aarch64_linux_Nightly/207 jdk_lang_1
-Xdump:system:none -Xdump:heap:none -Xdump:system:events=gpf+abort+traceassert+corruptcache -XX:-JITServerTechPreviewMessage -XX:-UseCompressedOops
java/lang/String/UnicodeCasingTest.javajava/lang/StringBuilder/Comparison.java