Closed mateuszrzeszutek closed 2 months ago
@0xdaryl fyi
java -Xjit:vmState=0x00053cff
vmState [0x53cff]: {J9VMSTATE_JIT} {coldBlockMarker}
@mateuszrzeszutek, is the test system able to capture a core file? Failing that, is there any way I could reproduce the problem in a stand-alone way? And if not, I'm wondering if you might be able to capture the bytecode for java/util/Collections.unmodifiableCollection
, which I'm guessing is being instrumented in the test environment.
I'm wondering if you might be able to capture the bytecode for java/util/Collections.unmodifiableCollection, which I'm guessing is being instrumented in the test environment
Ignore that - I guess it's not being instrumented
@mateuszrzeszutek, if capturing a core file is not easy to do, are you easily able to rerun the affected test with the following additional JIT compiler options? That might help shed more light on the problem.
-Xjit:{java/util/Collections.unmodifiableCollection*}(traceBC,traceILGen,log=issue15730.log),verbose,vlog=issue15730.vlog
We won't get to the bottom of this for 0.35. Moving to 0.36.
Hello @mateuszrzeszutek. I was wondering whether you have had any success in attempting to capture any trace logs or core files. They would be a great help in trying to resolve this problem.
Hey, Sorry for being quiet, I was on vacation/sick leave for some time and this flew under my radar completely. I'll try to compile the files you requested tomorrow.
Hi @mateuszrzeszutek. Have you had any luck producing core files or log files?
Hi, @mateuszrzeszutek. I just wanted to follow up to see whether you were able to produce any core files or log files for this problem. It will be difficult to figure out what might be going wrong without them.
Moving to 0.38 release.
Hi, @mateuszrzeszutek. I wanted to follow up again to see whether you're still seeing this problem and whether you might be able to gather any core or log files to help us investigate the cause of the crash.
Since I haven't heard from the originator of this problem recently, I spent a little bit of time digging into the jitdump file again, jitdump.20220816.133744.23602.0004.dmp.txt
In the version of IBM Semeru that's involved, the bytecode for java/util/Collections.unmodifiableCollection
looks like this:
public static <T> java.util.Collection<T> unmodifiableCollection(java.util.Collection<? extends T>);
Code:
0: aload_0
1: invokeinterface #157, 1 // InterfaceMethod java/util/Collection.getClass:()Ljava/lang/Class;
6: ldc #161 // class java/util/Collections$UnmodifiableCollection
8: if_acmpne 13
11: aload_0
12: areturn
13: new #161 // class java/util/Collections$UnmodifiableCollection
16: dup
17: aload_0
18: invokespecial #163 // Method java/util/Collections$UnmodifiableCollection."<init>":(Ljava/util/Collection;)V
21: areturn
The failure seems to occur in an AOT compilation. In the IL that's shown in the jitdump, we see this.
n3n BBStart <block_2> [0x7fb9399bc4b0] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n11n NULLCHK on n5n [#32] [0x7fb9399bc730] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n10n PassThrough [0x7fb9399bc6e0] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=1
n5n aload c<parm 0 Ljava/util/Collection;>[#356 Parm] [flags 0x40000107 0x0 ] [0x7fb9399bc550] bci=[-1,0,-] rc=4 vc=0 vn=- li=- udi=- nc=0
n18n ZEROCHK [#54] [0x7fb9399bc960] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n16n iload <temp slot 4>[#364 Auto] [flags 0x3 0x0 ] [0x7fb9399bc8c0] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=0
n63n istore <temp slot 3>[#363 Auto] [flags 0x3 0x0 ] [0x7fb939a38370] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n16n ==>iload
n7n treetop [0x7fb9399bc5f0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n6n aloadi <vft-symbol>[#285 Shadow] [flags 0x18607 0x0 ] [0x7fb9399bc5a0] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n5n ==>aload
n15n treetop [0x7fb9399bc870] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n9n aloadi <javaLangClassFromClass>[#275 Shadow +48] [flags 0x607 0x0 ] [0x7fb9399bc690] bci=[-1,3,-] rc=2 vc=0 vn=- li=- udi=- nc=1
n6n ==>aloadi
n62n astore <temp slot 2>[#362 Auto] [flags 0x7 0x0 ] [0x7fb939a38320] bci=[-1,0,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n9n ==>aloadi
n59n treetop [0x7fb939a38230] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
n5n ==>aload
n71n ifacmpeq --> block_10 BBStart at n68n () [0x7fb939a385f0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=2 flg=0x20
n5n ==>aload
n70n aconst NULL [0x7fb939a385a0] bci=[-1,3,-] rc=1 vc=0 vn=- li=- udi=- nc=0
n61n BBEnd </block_2> ===== [0x7fb939a382d0] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n60n BBStart <block_8> [0x7fb939a38280] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=0
n78n ResolveCHK [#290] [0x7fb939a38820] bci=[-1,3,-] rc=0 vc=0 vn=- li=- udi=- nc=1
I believe the ZEROCHK
at n18n is supposed to test the result of instanceof
, and throw an IncompatibleClassChangeError
if the object is not an instance of java.util.Collection
(though getClass
is actually a final method on java.lang.Object
). However, it looks like the ZEROCHK
is operating on the value of a variable -- #364
-- that is never set.
Putting that aside for now, IL generation is generating inline code for the call to getClass
, but a ResolveCHK
still appears, presumably with a NULL child in place of the missing call to getClass
.
I spent some time trying to reproduce the failure, but I'm hitting this AOT error:
storeValidationRecordIfNecessary:
constantPool 00000000000A4C50 cpIndex 0
reloKind 50 isStatic 0
method 000000000009D070 from class 000000000009D800 java/util/Collections
definingClass 000000000003D800
definingClass name java/lang/Object
Created new AOT class info 00007FEF285B7330
Compilation Failed Because: Method symbol reference is final in object
In the original jitdump, this AOT message appears for the recompilation:
AOT support of annotations temporarily disabled
ImproperInterfaceMethodFromCPRecord
_method=0x000000000003BD48
_beholder=0x00000000000A3F00
className=java/util/Collections
_cpIndex=60
kind=97
id=29
I will spend some more time trying to piece together how we might come to generate that IL.
Moving to 0.40.
I'm going to move this to the Backlog. I strongly suspect the problem still exists, but we haven't been able to reproduce it, and the originator hasn't reported any further occurrences of the problem, which hampers investigation.
Fixed by #19604 I think
As @eminence mentioned, this was likely fixed by pull request #19604. Closing.
Java -version output
Summary of problem
The OpenTelemetry Java Instrumentation builds have recently started randomly failing on JIT compilation in the Java 18 openj9 build.
Diagnostic files
You can find the diagnostic files attached to the GHA build: https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/2867509938 in the javacore-test-18 artifact.
In case you need more examples, this has happened several times in our daily build job: https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/2865361522, https://github.com/open-telemetry/opentelemetry-java-instrumentation/actions/runs/2858576759