eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.28k stars 721 forks source link

mac TestIBMJlmRemoteMemoryAuth_SE80_0 Process LT1 has ended unexpectedly #7397

Closed pshipton closed 2 years ago

pshipton commented 5 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_mac_Nightly/150 TestIBMJlmRemoteMemoryAuth_SE80_0 TestIBMJlmRemoteMemoryNoAuth_SE80_0

STF 23:59:29.669 - +------ Step 5 - Wait for the processes to complete
STF 23:59:29.669 - | Wait for processes to meet expectations
STF 23:59:29.669 - |   Processes: [LT1, CL1]
STF 23:59:29.669 - |
STF 23:59:29.669 - Monitoring processes: CL1 LT1
CL1 j> 2019/09/21 23:59:29.745 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
CL1 j> 2019/09/21 23:59:34.849 Monitored VM not ready at Sep 21, 2019 11:59:34 PM (attempt 0).
CL1 j> 2019/09/21 23:59:34.850 Wait 5 secs and trying again...
CL1 j> 2019/09/21 23:59:35.988 Connection established!
CL1 j> 2019/09/21 23:59:36.610 Starting to write data
STF 00:04:29.087 - Heartbeat: Process CL1 is still running
STF 00:04:29.595 - **FAILED** Process LT1 has ended unexpectedly

1.LT1.stdout shows the following, and stderr is empty.

00:08:47.800 - Load test completed
00:08:47.801 -   Ran     : 300000
00:08:47.801 -   Passed  : 300000
00:08:47.801 -   Failed  : 0
00:08:47.801 -   Result  : PASSED

There are no useful diagnostics to explain what happened.

pshipton commented 5 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_mac_Nightly/165

pshipton commented 5 years ago

@Mesbah-Alam

pshipton commented 5 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_mac_Nightly/169 TestIBMJlmRemoteMemoryAuth_SE80_0

pshipton commented 5 years ago

Closed #7169 as a dup of this.

JasonFengJ9 commented 4 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_windows_Nightly/184/tapResults/

STF 02:00:00.246 - **FAILED** Process CL1 has timed out
STF 02:00:00.246 - Collecting dumps for: CL1
STF 02:00:01.603 - Using procdump.exe to generate .DMP files

STF 02:01:33.795 - Monitoring Report Summary:
STF 02:01:33.795 -   o Process CL1 has timed out
STF 02:01:33.795 -   o Process LT1 is still running as expected
STF 02:01:33.796 - Killing processes: CL1 LT1
STF 02:01:34.670 - Using powershell to kill CL1

STF 02:01:39.483 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 02:01:39.483 - 

STF 02:01:39.655 - =====================   R E S U L T S   =====================
STF 02:01:39.655 - Stage results:
STF 02:01:39.655 -   setUp:     pass
STF 02:01:39.655 -   execute:  *fail*
STF 02:01:39.655 -   teardown:  pass
STF 02:01:39.655 - 
STF 02:01:39.655 - Overall result: **FAILED**

TestIBMJlmRemoteMemoryNoAuth_SE80_0_FAILED

@pshipton update: This one is different, the stdout shows the LT1 process just stopped before completion (likely killed), there is nothing in the stderr. There are 2.CL1.jvmdump.stdout and kill_powershell.stderr files that indicate a process doesn't exist. No process matching the specified PID can be found. Stop-Process : Cannot find a process with the process identifier 260.

I'm inclined to ignore this failure as some kind of machine problem, i.e. maybe something else was running and the test didn't get enough CPU and timed out. There is no indication of a crash, and no JVM diagnostic files to look at.

02:01:02.309 - Completed 14.0%. Number of tests started=42132 (+5628)
02:01:22.279 - Completed 16.2%. Number of tests started=48532 (+6400)
pshipton commented 4 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_mac_Nightly/194

@Mesbah-Alam did you ever get a chance to look at this problem?

JasonFengJ9 commented 4 years ago

https://ci.eclipse.org/openj9/job/Test_openjdk13_j9_sanity.system_x86-64_mac_Nightly/143/tapResults/

STF 04:20:56.211 - **FAILED** Process CL1 has timed out
STF 04:20:56.211 - Collecting dumps for: CL1
STF 04:20:56.211 - Sending SIG 3 to the java process to generate a javacore
STF 04:20:56.212 - Running command: kill -3 6680

Similar with https://github.com/eclipse/openj9/issues/7397#issuecomment-549149904

pshipton commented 4 years ago

@JasonFengJ9 the previous failure is https://github.com/eclipse/openj9/issues/7276#issuecomment-573330677 you need to look at the artifacts.

JasonFengJ9 commented 4 years ago

This kind of failure (Process LT1 has ended unexpectedly) occurred in various Java level/platform.

JDK8 (x86-64 Linux) - https://ci.eclipse.org/openj9/job/Test_openjdk8_j9_sanity.system_x86-64_linux_Nightly/452/tapResults/ JDK11 (AIX) - https://ci.eclipse.org/openj9/job/Test_openjdk11_j9_sanity.system_ppc64_aix_Nightly/447/tapResults/ JDK15 (PPCLE Linux) - https://ci.eclipse.org/openj9/job/Test_openjdk15_j9_sanity.system_ppc64le_linux_Nightly/14/tapResults/

lumpfish commented 4 years ago

This is fixed by https://github.com/eclipse/openj9-systemtest/pull/118

JasonFengJ9 commented 3 years ago

Another occurrence at an internal build Test_openjdk11_j9_sanity.system_s390x_linux_xl/5

STF 22:28:47.826 - +------ Step 5 - Wait for the processes to complete
STF 22:28:47.826 - | Wait for processes to meet expectations
STF 22:28:47.826 - |   Processes: [LT1, CL1]
STF 22:28:47.826 - |
STF 22:28:47.826 - Monitoring processes: CL1 LT1
CL1 j> 2021/01/20 22:28:48.015 ServerURL=service:jmx:rmi:///jndi/rmi://localhost:1234/jmxrmi
CL1 j> 2021/01/20 22:28:48.099 Attempting to connect
CL1 j> 2021/01/20 22:28:48.423 Monitored VM not ready at Jan 20, 2021, 10:28:48 PM (attempt 1, elapsed 301ms).
CL1 j> 2021/01/20 22:28:48.423 Waiting 5 secs and trying again...
CL1 j> 2021/01/20 22:28:53.423 Attempting to connect
CL1 j> 2021/01/20 22:28:56.762 Connection established!
CL1 j> 2021/01/20 22:28:57.492 Starting to write data
STF 22:31:27.322 - **FAILED** Process LT1 has ended unexpectedly
STF 22:31:27.326 - Monitoring Report Summary:
STF 22:31:27.330 -   o Process CL1 is still running
STF 22:31:27.330 -   o Process LT1 has ended unexpectedly
STF 22:31:27.399 - Killing processes: CL1 LT1
STF 22:31:27.401 -   o Process CL1 pid 5444 stop()
STF 22:31:28.409 -   o Process CL1 pid 5444 killed
CL1 ..............j> 2021/01/20 22:31:27.468 Exiting as JVM we are connected to has finished
CL1 stderr Exception in thread "main" java.lang.AssertionError
CL1 stderr  at org.junit.Assert.fail(Assert.java:86)
CL1 stderr  at org.junit.Assert.fail(Assert.java:95)
CL1 stderr  at net.openj9.test.jlm.remote.MemoryProfiler.getStatsViaProxy(MemoryProfiler.java:217)
CL1 stderr  at net.openj9.test.jlm.remote.MemoryProfiler.main(MemoryProfiler.java:119)
STF 22:31:28.411 -   o Process LT1 pid 5443 is not running
**FAILED** at step 5 (Wait for the processes to complete). Expected return value=0 Actual=1 at /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_s390x_linux_xl/openjdk-tests/TKG/../TKG/test_output_16112051315889/TestIBMJlmRemoteMemoryAuth_0/20210120-222845-TestIBMJlmRemoteMemoryAuth/execute.pl line 178.
STF 22:31:28.953 - **FAILED** execute script failed. Expected return value=0 Actual=1
STF 22:31:28.969 - 
STF 22:31:28.978 - ====================   T E A R D O W N   ====================
STF 22:31:28.980 - Running teardown: perl /home/jenkins/workspace/Test_openjdk11_j9_sanity.system_s390x_linux_xl/openjdk-tests/TKG/../TKG/test_output_16112051315889/TestIBMJlmRemoteMemoryAuth_0/20210120-222845-TestIBMJlmRemoteMemoryAuth/tearDown.pl
STF 22:31:29.259 - TEARDOWN stage completed
STF 22:31:29.269 - 
STF 22:31:29.269 - =====================   R E S U L T S   =====================
STF 22:31:29.271 - Stage results:
STF 22:31:29.277 -   setUp:     pass
STF 22:31:29.277 -   execute:  *fail*
STF 22:31:29.278 -   teardown:  pass
STF 22:31:29.278 - 
STF 22:31:29.279 - Overall result: **FAILED**

TestIBMJlmRemoteMemoryAuth_0_FAILED
pshipton commented 3 years ago

3.LT1.stdout shows the following, no explanation why it stopped. No JVM diagnostic files.

22:30:29.098 - Completed 5.6%. Number of tests started=184628 (+39053)
22:30:49.102 - Completed 6.7%. Number of tests started=217262 (+32634)
22:31:09.057 - Completed 7.8%. Number of tests started=252425 (+35163)
lumpfish commented 3 years ago

Nothing in LT1.stderr either?

pshipton commented 3 years ago

Nothing in LT1.stderr either?

It's empty.

system_test_output (1).tar.gz

lumpfish commented 3 years ago

Raised https://github.com/AdoptOpenJDK/openjdk-systemtest/issues/402 to get more diagnostics

pshipton commented 2 years ago

Not seen for some time.