eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 720 forks source link

WAS crash compiling com/ibm/ws/event/internal/EventEngineImpl.publishEvent() #19758

Closed tajila closed 4 months ago

tajila commented 4 months ago
[WARNING ] CWWKV0008W: Dynamic Routing service sent 307 response to the client webserver1432188 because client expired.
Unhandled exception
Type=Illegal instruction vmState=0x00000000
J9Generic_Signal_Number=00000048 Signal_Number=00000004 Error_Value=00000000 Signal_Code=00000001
Handler1=000003FFBB849410 Handler2=000003FFBB7319D8
gpr0=0000000000000016 gpr1=0000000000000000 gpr2=000000000000000E gpr3=0000000000000004
gpr4=000003FF902A28E0 gpr5=000003FF740ABD80 gpr6=DF2BC1C200000004 gpr7=000003FF90854098
gpr8=00000000496515F8 gpr9=0000000000000000 gpr10=0000000000BBA200 gpr11=0000000049651708
gpr12=000000000D43DDF0 gpr13=00000000024CB400 gpr14=00000000024CBFE0 gpr15=000003FFB807D6A0
psw=000003FF902A2916 mask=0705100180000000 fpc=00080000 bea=000003FFBBA0FB20
fpr0 000003ff740006a0 (f: 1946158720.000000, d: 2.171763e-311)
fpr1 0000000000000007 (f: 7.000000, d: 3.458460e-323)
fpr2 000003ff740cdaa0 (f: 1946999424.000000, d: 2.171764e-311)
fpr3 000003ff740ad108 (f: 1946865920.000000, d: 2.171764e-311)
fpr4 000003ff740acc22 (f: 1946864640.000000, d: 2.171764e-311)
fpr5 f8e2a1ed1bed74e2 (f: 468546784.000000, d: -2.015941e+274)
fpr6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr7 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr8 000003ff740abc58 (f: 1946860672.000000, d: 2.171764e-311)
fpr9 0000000000041000 (f: 266240.000000, d: 1.315400e-318)
fpr10 000003ff8407c7c0 (f: 2215102464.000000, d: 2.171896e-311)
fpr11 000002aa299e03f0 (f: 698221568.000000, d: 1.447546e-311)
fpr12 000003ffb807f900 (f: 3087530240.000000, d: 2.172327e-311)
fpr13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr14 000003ffb803f000 (f: 3087265792.000000, d: 2.172327e-311)
fpr15 000002aa299d04d0 (f: 698156224.000000, d: 1.447546e-311)

Compiled_method=com/ibm/ws/event/internal/EventEngineImpl.publishEvent(Lcom/ibm/websphere/event/Event;Z)Lcom/ibm/ws/event/internal/EventImpl;
Target=2_90_20240116_630 (Linux 5.14.0-427.16.1.el9_4.s390x)
CPU=s390x (2 logical CPUs) (0x1e51fd000 RAM)
----------- Stack Backtrace -----------
 (0x000003FF902A2916 [<unknown>+0x0])
---------------------------------------
JVMDUMP039I Processing dump event "gpf", detail "" at 2024/05/30 09:27:17 - please wait.
JVMDUMP032I JVM requested System dump using '/home/nest/usr/servers/cc/core.20240530.092717.432498.0001.dmp' in response to an event
JVMPORT030W /proc/sys/kernel/core_pattern setting "|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" specifies that the core dump is to be piped to an external program.  Attempting to rename either core or core.583424.  Review the manual for the external program to find where the core dump is written and ensure the program does not truncate it.

JVMPORT049I The core file created by child process with pid = 583424 was not found. Review the documentation for the /proc/sys/kernel/core_pattern program "|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h" to find where the core file is written and ensure that program does not truncate it.
tajila commented 4 months ago

@hzongaro ^^

pshipton commented 4 months ago

@r30shah fyi

r30shah commented 4 months ago

@TobiAjila Do we have core-dump saved up from this failure ? Also if this is easily reproducible, what did you run ?

tajila commented 4 months ago

Working on getting diagnostics

tajila commented 4 months ago

Diagnostics_19758.zip

JIT dump, unfortunately no system core yet

stevenschader commented 4 months ago

I"m uploading the coredumps from the s390x host -> https://ibm.box.com/s/pwkc72b0uyx1o6j2nfed5q12y1un47an

r30shah commented 4 months ago

~Hi @stevenschader Do you by any chance have console log like the one Tobi have shared in the description ? ~

NVM, I was able to get that from core-dump,

> !gpinfo
Failing Thread: !j9vmthread 0x10e2b00
Failing Thread ID: 0x218f (8591)
gpInfo:
J9Generic_Signal_Number=00000048 Signal_Number=00000004 Error_Value=00000000 Signal_Code=00000001
Handler1=000003FF9DD495D8 Handler2=000003FF9E0B1AB8
gpr0=0000000000000019 gpr1=0000000000000000 gpr2=000000000000000E gpr3=000003FF9DFC0DB8
gpr4=000003FF7017762A gpr5=000003FF301598E8 gpr6=DF4CE2D3242EA40C gpr7=000003FF86D93DD0
gpr8=000003FF6C6FDF50 gpr9=0000000000000000 gpr10=000003FF9D5F0110 gpr11=000003FF6C6FE560
gpr12=000003FF6C6FDA20 gpr13=00000000010E2B00 gpr14=00000000010E36E0 gpr15=000003FF6C6FD6A0
psw=000003FF70177638 mask=0705100180000000 fpc=00080000 bea=000003FF9DF1A934
fpr0 000003ff300002d0 (f: 805307072.000000, d: 2.171200e-311)
fpr1 0000000000000007 (f: 7.000000, d: 3.458460e-323)
fpr2 000003ff9801e7b0 (f: 2550261760.000000, d: 2.172062e-311)
fpr3 000003ff300035c8 (f: 805320128.000000, d: 2.171200e-311)
fpr4 000003ff30002fce (f: 805318592.000000, d: 2.171200e-311)
fpr5 5f37a15765b223e9 (f: 1706173440.000000, d: 4.834438e+150)
fpr6 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr7 fc1f0007c926bad7 (f: 3374758656.000000, d: -7.552647e+289)
fpr8 000003ff30159818 (f: 806721536.000000, d: 2.171200e-311)
fpr9 0000000000041000 (f: 266240.000000, d: 1.315400e-318)
fpr10 000003ff5c0a6248 (f: 1544184448.000000, d: 2.171565e-311)
fpr11 000002aa40e9c070 (f: 1089060992.000000, d: 1.447739e-311)
fpr12 000003ff6c6ff900 (f: 1819277568.000000, d: 2.171701e-311)
fpr13 0000000000000000 (f: 0.000000, d: 0.000000e+00)
fpr14 000003ff6c6bf000 (f: 1819013120.000000, d: 2.171700e-311)
fpr15 000002aa40ebb570 (f: 1089189248.000000, d: 1.447739e-311)

Compiled_method=com/ibm/ws/threading/internal/SchedulingHelper.run()V
r30shah commented 4 months ago

Thanks a lot @stevenschader , I checked out the core-dump uploaded on the box folder and based on the above failing gpInfo, Illegal instruction it is complaining about is 0xb2ec0011 . That instruction opcode is for ETND instruction and last month, @Spencer-Comin Fixed this bug in https://github.com/eclipse-openj9/openj9/pull/19495.

stevenschader commented 4 months ago

@r30shah do you know if that fix is in the level of JVM i'm running:

[nest@wslnx055 ~]$ wlp/java/jdk-17.0.11+9/bin/java -version
openjdk version "17.0.11" 2024-04-16
IBM Semeru Runtime Open Edition 17.0.11.0 (build 17.0.11+9)
Eclipse OpenJ9 VM 17.0.11.0 (build openj9-0.44.0, JRE 17 Linux s390x-64-Bit Compressed References 20240416_715 (JIT enabled, AOT enabled)
OpenJ9   - b0699311c7
OMR      - 254af5a04
JCL      - 5d7d758b682 based on jdk-17.0.11+9)
r30shah commented 4 months ago

Hi, @stevenschader I should have mentioned that earlier, the fix will be available in 0.46 release of OpenJ9 , while you are currently on 0.44. So it does not have the fix.

pshipton commented 4 months ago

Here is a preliminary 0.46 build you can try. https://ibm.box.com/s/z2fgt1kcm0bxzq04hrvz0qov0q1jyhi5

stevenschader commented 4 months ago

@pshipton testing the java driver on the target s30x host worked as expected.... Thanks.

Instana was able to get it's hooks in and the liberty jvm survived...

instana logs:

2024-06-27T12:31:17.618-04:00 | INFO  | 4c3-1f4b-4a3c-ae2d-6c6822000dd3) | Java             | com.instana.sensor-java - 1.2.129 | Activated Sensor for JVM 148948
2024-06-27T12:31:17.620-04:00 | INFO  | dbc-5c1b-4875-adcd-32500f5a822f) | Process          | com.instana.sensor-process - 1.1.48 | Activated Sensor for PID 148948
2024-06-27T12:31:17.620-04:00 | INFO  | instana-executor-thread-2-7      | Java             | com.instana.sensor-java - 1.2.129 | Attaching to JVM with PID 148948
2024-06-27T12:31:17.621-04:00 | INFO  | instana-executor-thread-2-7      | aagentLoaderImpl | com.instana.agent - 1.1.725 | Performing initial attach to JVM with PID 148948
2024-06-27T12:31:17.621-04:00 | INFO  | a7e-cf02-493f-8ea6-4ff0a56c81fb) | JavaTrace        | com.instana.sensor-java-trace - 1.2.470 | Activated Sensor for JVM 148948
2024-06-27T12:31:17.622-04:00 | INFO  | instana-executor-thread-2-18     | JavaTrace        | com.instana.sensor-java-trace - 1.2.470 | Attaching to JVM with PID 148948
2024-06-27T12:31:18.803-04:00 | INFO  | nstana-socket-client-thread--6-4 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.agent.main.javaagentloader connection established
2024-06-27T12:31:18.894-04:00 | INFO  | instana-executor-thread-2-7      | aagentLoaderImpl | com.instana.agent - 1.1.725 | Initial attach to JVM with PID 148948 successful
2024-06-27T12:31:18.929-04:00 | INFO  | nstana-socket-client-thread--6-1 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.plugin.java:commands connection established
2024-06-27T12:31:18.930-04:00 | INFO  | nstana-socket-client-thread--6-2 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.plugin.java:data connection established
2024-06-27T12:31:18.947-04:00 | INFO  | nstana-socket-client-thread--6-3 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.plugin.javatrace:commands connection established
2024-06-27T12:31:18.947-04:00 | INFO  | instana-executor-thread-2-18     | JavaTrace        | com.instana.sensor-java-trace - 1.2.470 | The JVM 148948 is running another agent via the '-javaagent:/home/nest/wlp/bin/tools/ws-javaagent.jar' startup parameter.
2024-06-27T12:31:18.948-04:00 | INFO  | nstana-socket-client-thread--6-4 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.plugin.javatrace:data connection established
2024-06-27T12:31:19.914-04:00 | INFO  | nstana-socket-client-thread--6-2 | LoggerEndpoint   | com.instana.agent - 1.1.725 | JVM (148948) - Spent 947 ms for instrumentation [prometheus]
2024-06-27T12:31:26.476-04:00 | INFO  | 094-333d-40f7-8e0e-891cccdbace6) | WebSphereLiberty | com.instana.sensor-websphere-liberty - 1.1.6 | Activated Sensor for JVM 148948
2024-06-27T12:31:26.476-04:00 | INFO  | instana-executor-thread-2-14     | WebSphereLiberty | com.instana.sensor-websphere-liberty - 1.1.6 | Attaching to JVM with PID 148948
2024-06-27T12:31:26.561-04:00 | INFO  | nstana-socket-client-thread--6-3 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.plugin.websphereliberty:commands connection established
2024-06-27T12:31:26.565-04:00 | INFO  | nstana-socket-client-thread--6-4 | tClientIdDecoder | com.instana.agent - 1.1.725 | 148948:com.instana.plugin.websphereliberty:data connection established
2024-06-27T12:31:32.800-04:00 | INFO  | nstana-socket-client-thread--6-1 | LoggerEndpoint   | com.instana.agent - 1.1.725 | JVM (148948) - Tracing is enabled for the following JDBC drivers: DB2
2024-06-27T12:31:43.334-04:00 | INFO  | nstana-socket-client-thread--6-2 | LoggerEndpoint   | com.instana.agent - 1.1.725 | JVM (148948) - Spent 23328 ms and 0 KiB for instrumentation [trace] (Used: 0 KiB, Max: unlimited) (JVM uptime: 37088 ms)
2024-06-27T12:31:43.773-04:00 | INFO  | nstana-socket-client-thread--6-2 | LoggerEndpoint   | com.instana.agent - 1.1.725 | JVM (148948) - Spent 439 ms for instrumentation [micrometer]
pshipton commented 4 months ago

Since this is fixed and confirmed on 0.46 I will go ahead and close it.