adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

Hotspot test serviceability/sa/ClhsdbCDSCore.java hangs on adoptium infra #3745

Open zzambers opened 1 week ago

zzambers commented 1 week ago

I can see, that this test hangs on adoptium infra, being killed on timeout (seems reliable): serviceability/sa/ClhsdbCDSCore.java

I can see this both in dev.openjdk run and when ran in grinder.

Output:

Starting ClhsdbCDSCore test
Command line: [/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -cp /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/serviceability/sa/ClhsdbCDSCore.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/hotspot/jtreg/serviceability/sa:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/test/lib:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/jtreg.jar -ea -esa -Xmx512m -XX:+UseCompressedOops -Xshare:dump -Xlog:cds,cds+hashtables -XX:SharedArchiveFile=./ArchiveForClhsdbCDSCore.jsa ]
[2024-09-17T11:46:52.145720Z] Gathering output for process 25719
[ELAPSED: 447 ms]
[logging stdout to serviceability.sa.ClhsdbCDSCore.java-0000-dump.stdout]
[logging stderr to serviceability.sa.ClhsdbCDSCore.java-0000-dump.stderr]
[STDERR]

[2024-09-17T11:46:52.603422Z] Waiting for completion for process 25719
[2024-09-17T11:46:52.603687Z] Waiting for completion finished for process 25719
Command line: [/home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -cp /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/serviceability/sa/ClhsdbCDSCore.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/hotspot/jtreg/serviceability/sa:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17265736059645/hotspot_custom_0/work/classes/0/test/lib:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/javatest.jar:/home/jenkins/workspace/Grinder/jvmtest/openjdk/jtreg/lib/jtreg.jar -ea -esa -Xmx512m -XX:+UseCompressedOops -Xmx512m -XX:+UnlockDiagnosticVMOptions -XX:SharedArchiveFile=ArchiveForClhsdbCDSCore.jsa -XX:+CreateCoredumpOnCrash -Xshare:auto -XX:+ProfileInterpreter --add-exports=java.base/jdk.internal.misc=ALL-UNNAMED -XX:-AlwaysPreTouch CrashApp ]
[2024-09-17T11:46:52.610596Z] Gathering output for process 25735
[2024-09-17T11:46:52.611510Z] Waiting for completion for process 25735
[2024-09-17T11:46:52.628039Z] Waiting for completion finished for process 25735
Run test with ulimit -c: unlimited
[2024-09-17T11:46:52.630845Z] Gathering output for process 25738
Timeout signalled after 19200 seconds

Notes: I have tried to reproduce this locally or on our ifra both manually invoking jtreg and through aqa-tests, but failed to reproduce it. Maybe it is inra/environment issue? Test first intentionally crashes the VM using Unsafe class to produce core file. However this hangs when ran on adoptium infra. Maybe something with core dump settings? I don't know.

zzambers commented 1 week ago

This could be related to JDK-8283410, but on Adoptium infra it seems to affect linux (not windows?).

sophia-guo commented 4 days ago

@zzambers I did run it on a different agent ClhsdbCDSCore.java and it passed https://ci.adoptium.net/view/Test_grinder/job/Grinder/10970/ ( failed one is due to no test selected.) So it might be related with infra as you can't reproduce it on your environment. Could you please move it to infra repo? Or I can move it if you agree?

zzambers commented 3 days ago

@sophia-guo by moving you mean filling the same issue there and closing this one?

sophia-guo commented 2 days ago

There is a transfer issue link at the right side of the issue.

Screenshot 2024-09-25 at 9 37 10 AM

I'm not sure if it's clickable for you as it might be related with the permission. I will just do this.