eclipse-openj9 / openj9

Eclipse OpenJ9: A Java Virtual Machine for OpenJDK that's optimized for small footprint, fast start-up, and high throughput. Builds on Eclipse OMR (https://github.com/eclipse/omr) and combines with the Extensions for OpenJDK for OpenJ9 repo.
Other
3.24k stars 713 forks source link

openJcePlusTests_0_FAILED timed out at runCurveMixTest #19200

Open JasonFengJ9 opened 3 months ago

JasonFengJ9 commented 3 months ago

Failure link

From an internal build(sles15x86-rtp-rt2-1):

java version "17.0.11-beta" 2024-04-16
IBM Semeru Runtime Certified Edition 17.0.11+6-202403192324 (build 17.0.11-beta+6-202403192324)
Eclipse OpenJ9 VM 17.0.11+6-202403192324 (build master-7c9937b26, JRE 17 Linux amd64-64-Bit Compressed References 20240319_633 (JIT enabled, AOT enabled)
OpenJ9   - 7c9937b26
OMR      - 1bf2ef421
JCL      - db5ba535ed based on jdk-17.0.11+6)

Rerun in Grinder - Change TARGET to run only the failed test targets.

Optional info

Failure output (captured from console output)

[2024-03-20T00:14:14.132Z] variation: NoOptions
[2024-03-20T00:14:14.132Z] JVM_OPTIONS:  

[2024-03-20T00:14:14.132Z] TESTING:
[2024-03-20T00:14:15.211Z] Buildfile: /home/jenkins/workspace/Test_openjdk17_j9_extended.functional_x86-64_linux_testList_0/jvmtest/functional/OpenJcePlusTests/test.xml

[2024-03-20T00:20:16.559Z]      [test]     [junit] ************************** Starting runCurveMixTest ************************
[2024-03-20T00:20:17.040Z]      [test]     [junit] Alg = X25519
[2024-03-20T00:20:17.040Z]      [test]     [junit]  pbk =302a300506032b656e0321002d4a18369e41a277419cf71258fd68607fab747695c5ad63bcd45db81e15e737
[2024-03-20T00:20:17.040Z]      [test]     [junit]  pbk2 = 302a300506032b656e0321002d4a18369e41a277419cf71258fd68607fab747695c5ad63bcd45db81e15e737
[2024-03-20T00:20:17.040Z]      [test]     [junit] Alg = X448
[2024-03-20T00:20:17.040Z]      [test]     [junit]  pbk =3042300506032b656f0339005ecd39456de5d6af4a6447a433dc0d0958b88720def4fc064bd816d8bebb6b1eeb933639dffa30cab8c66f7d3e98d0ac7438191c92a2d692
[2024-03-20T00:20:17.040Z]      [test]     [junit]  pbk2 = 3042300506032b656f0339005ecd39456de5d6af4a6447a433dc0d0958b88720def4fc064bd816d8bebb6b1eeb933639dffa30cab8c66f7d3e98d0ac7438191c92a2d692
[2024-03-20T01:14:28.919Z] 
[2024-03-20T01:14:28.919Z] BUILD FAILED
[2024-03-20T01:14:28.919Z] /home/jenkins/workspace/Test_openjdk17_j9_extended.functional_x86-64_linux_testList_0/jvmtest/functional/OpenJcePlusTests/test.xml:33: Timeout: killed the sub-process
[2024-03-20T01:14:28.919Z] 
[2024-03-20T01:14:28.919Z] Total time: 60 minutes 1 second
[2024-03-20T01:14:28.919Z] -----------------------------------
[2024-03-20T01:14:28.919Z] openJcePlusTests_0_FAILED

50x internal grinder - https://github.com/eclipse-openj9/openj9/issues/19200#issuecomment-2013795553

pshipton commented 3 months ago

Succeeded 10/10 on cent8x86-rtp-rt8-1 Failed/timeout running runCurveMixTest on sles12x86-rtp-rt1-1 which killed the grinder Failed 1/10 on ubu22x86-rt-1, dup of https://github.com/eclipse-openj9/openj9/issues/19205 Failed 10/10 on ubu22x86-svl-rt12-1, runCurveMixTest timeouts Failed 10/10 on sles12x86-svl-rt7-1, 9 runCurveMixTest timeouts, one in OpenJCEPlusFIPS DH private exponenent size

@jasonkatonica fyi

pshipton commented 3 months ago

The failures aren't in the 0.44 builds but I assume openJcePlus is the same. Added to the 0.44 milestone for now.

JasonFengJ9 commented 3 months ago

Also seen at JDK11 x86-64_linux(sles15x86-rtp-rt2-1)

llxia commented 3 months ago

The timeout is increased to 2hrs (see PR)

pshipton commented 3 months ago

and https://github.com/adoptium/aqa-tests/pull/5175

llxia commented 3 months ago

Just for the record, it seems that sles15x86-rtp-rt2-1.fyre.ibm.com is slow. The test passed and took ~74 mins on this machine. The test usually takes ~20mins on the other machines. For details, see link.

I will close this issue. Please reopen if the problem occurs again.

JasonFengJ9 commented 3 months ago

Seen at JDK21 x86-64_linux(ub-epyc7302p-1s16c-01)

[2024-04-02T01:07:48.596Z] variation: NoOptions
[2024-04-02T01:07:48.596Z] JVM_OPTIONS: 

[2024-04-02T01:12:27.487Z]      [test]     [junit] ************************** Starting runCurveMixTest ************************
[2024-04-02T01:12:27.487Z]      [test]     [junit] Alg = X25519
[2024-04-02T01:12:27.487Z]      [test]     [junit]  pbk =302a300506032b656e0321008e0f02846e21dae556c5aa9593d5eb27dfbc1a9def9917b57f44e5be02daeb52
[2024-04-02T01:12:27.487Z]      [test]     [junit]  pbk2 = 302a300506032b656e0321008e0f02846e21dae556c5aa9593d5eb27dfbc1a9def9917b57f44e5be02daeb52
[2024-04-02T01:12:27.487Z]      [test]     [junit] Alg = X448
[2024-04-02T01:12:27.487Z]      [test]     [junit]  pbk =3042300506032b656f0339000af9a6a36afebde07ee250c59b10cbc1093b3f3f01a44afa91e51465003f1d7cfe96d3b78cfe4b70614102ae4d1ba2eaab62d5dfd1dd3d86
[2024-04-02T01:12:27.487Z]      [test]     [junit]  pbk2 = 3042300506032b656f0339000af9a6a36afebde07ee250c59b10cbc1093b3f3f01a44afa91e51465003f1d7cfe96d3b78cfe4b70614102ae4d1ba2eaab62d5dfd1dd3d86
[2024-04-02T03:07:54.822Z] 
[2024-04-02T03:07:54.822Z] BUILD FAILED
[2024-04-02T03:07:54.822Z] /home/jenkins/workspace/Test_openjdk21_j9_extended.functional_x86-64_linux_testList_0/jvmtest/functional/OpenJcePlusTests/test.xml:33: Timeout: killed the sub-process
[2024-04-02T03:07:54.822Z] 
[2024-04-02T03:07:54.822Z] Total time: 120 minutes 0 seconds
[2024-04-02T03:07:54.822Z] -----------------------------------
[2024-04-02T03:07:54.822Z] openJcePlusTests_0_FAILED
pshipton commented 3 months ago

@llxia another re-occurrence after increasing the timeout.

llxia commented 3 months ago

@jasonkatonica do you have any suggestions? Or should I increase the timeout again?

jasonkatonica commented 3 months ago

We have not seen these tests take over 2 hours typically on the machines we have worked with. Additionally there is a pattern emerging here that the test runCurveMixTest seems to hang on occasion taking up most of the 2 hour allotment of time. Most likely there is a real problem here either when running this test multi-threaded or simply occurring at random times.

Given failing logs I dont believe that increasing the timeout would be helpful at this point and we will need to investigate further.