adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
86 stars 102 forks source link

Hotspot test serviceability/sa/ClhsdbCDSCore.java hangs on adoptium infra #3745

Open zzambers opened 2 months ago

zzambers commented 2 months ago

I can see, that this test hangs on adoptium infra, being killed on timeout (seems reliable): serviceability/sa/ClhsdbCDSCore.java

I can see this both in dev.openjdk run and when ran in grinder.

Output:

Starting ClhsdbCDSCore test
Command line: [/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -cp /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/serviceability/sa/ClhsdbCDSCore.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/hotspot/jtreg/serviceability/sa:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/test/lib:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/jtreg.jar -ea -esa -Xmx512m -XX:+UseCompressedOops -Xshare:dump -Xlog:cds,cds+hashtables -XX:SharedArchiveFile=./ArchiveForClhsdbCDSCore.jsa ]
[2024-09-17T11:46:52.145720Z] Gathering output for process 25719
[ELAPSED: 447 ms]
[logging stdout to serviceability.sa.ClhsdbCDSCore.java-0000-dump.stdout]
[logging stderr to serviceability.sa.ClhsdbCDSCore.java-0000-dump.stderr]
[STDERR]

[2024-09-17T11:46:52.603422Z] Waiting for completion for process 25719
[2024-09-17T11:46:52.603687Z] Waiting for completion finished for process 25719
Command line: [/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -cp /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/serviceability/sa/ClhsdbCDSCore.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/hotspot/jtreg/serviceability/sa:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/test/lib:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/jtreg.jar -ea -esa -Xmx512m -XX:+UseCompressedOops -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:SharedArchiveFile=ArchiveForClhsdbCDSCore.jsa -XX:+CreateCoredumpOnCrash -Xshare:auto -XX:+ProfileInterpreter --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -XX:-AlwaysPreTouch CrashApp ]
[2024-09-17T11:46:52.610596Z] Gathering output for process 25735
[2024-09-17T11:46:52.611510Z] Waiting for completion for process 25735
[2024-09-17T11:46:52.628039Z] Waiting for completion finished for process 25735
Run test with ulimit -c: unlimited
[2024-09-17T11:46:52.630845Z] Gathering output for process 25738
Timeout signalled after 19200 seconds

Notes: I have tried to reproduce this locally or on our ifra both manually invoking jtreg and through aqa-tests, but failed to reproduce it. Maybe it is inra/environment issue? Test first intentionally crashes the VM using Unsafe class to produce core file. However this hangs when ran on adoptium infra. Maybe something with core dump settings? I don't know.

zzambers commented 2 months ago

This could be related to JDK-8283410, but on Adoptium infra it seems to affect linux (not windows?).

sophia-guo commented 2 months ago

@zzambers I did run it on a different agent ClhsdbCDSCore.java and it passed https://ci.adoptium.net/view/Test_grinder/job/Grinder/10970/ ( failed one is due to no test selected.) So it might be related with infra as you can't reproduce it on your environment. Could you please move it to infra repo? Or I can move it if you agree?

zzambers commented 2 months ago

@sophia-guo by moving you mean filling the same issue there and closing this one?

sophia-guo commented 2 months ago

There is a transfer issue link at the right side of the issue.

Screenshot 2024-09-25 at 9 37 10 AM

I'm not sure if it's clickable for you as it might be related with the permission. I will just do this.

sxa commented 1 month ago

@zzambers I did run it on a different agent ClhsdbCDSCore.java and it passed https://ci.adoptium.net/view/Test_grinder/job/Grinder/10970/ ( failed one is due to no test selected.) So it might be related with infra as you can't reproduce it on your environment. Could you please move it to infra repo? Or I can move it if you agree?

@sophia-guo Can you get a list of which machines/distributions it passes and fails on? Your one was run on RHEL. Both of zzambers' runs were on an (old, out of support) Ubuntu distribution (although neither were in containers). At the moment I'm not sure we have enough information to be able to be able to take an action this one in the infrastructure repo since it's not clear what is needed to resolve it.

sxa commented 3 days ago

There are recent dev.hotspot runs which look clean - was this test removed and is it still considered a problem?

sxa commented 3 days ago

I tried kicking off some grinders for testing (based on JDK11 since that's what the dev.openjdk link in the description was pointing at but got 15:21:35 Error: Cannot find file: /home/jenkins/workspace/Grinder/aqa-tests/TKG/../openjdk/openjdk-jdk/test/jdk/serviceability/sa/ClhsdbCDSCore.java which suggests that this test may no longer be valid: