Closed lumpfish closed 1 year ago
To run just this failing test: https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=11&JDK_IMPL=hotspot&JDK_VENDOR=adoptopenjdk&BUILD_LIST=openjdk&PLATFORM=s390x_linux_xl&CUSTOM_TARGET=test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java&TARGET=jdk_custom_1
@lumpfish Using PLATFORM=s390x_linux
instead of s390x_linux_xl
, I can get this test to pass on all Marist machines. An example on build-marist-rhel77-s390x-2
When using PLATFORM=s390x_linux_xl
however the job has trouble uncompressing the jdk binary
12:56:48 _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks https://api.adoptopenjdk.net/v3/binary/latest/11/ea/linux/s390x/jdk/hotspot/large/adoptopenjdk
12:56:48 _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks https://api.adoptopenjdk.net/v3/binary/latest/11/ea/linux/s390x/testimage/hotspot/large/adoptopenjdk
12:56:49 Uncompressing file: adoptopenjdk ...
12:56:49
12:56:49 gzip: adoptopenjdk: not in gzip format
12:56:49 tar: This does not look like a tar archive
12:56:49 tar: Exiting with failure status due to previous errors
Is it because there isnt an XL version available here? https://adoptopenjdk.net/nightly.html
There is no _xl
any more - don't include that suffix specifying the platform.
I'll look into why the link (which I took from the end of a failing job) contains the _xl
. It may be that something needs fixing.
This is the job I took the command line from: https://ci.adoptopenjdk.net/view/Test_grinder/job/Grinder/21/parameters/
The input PLATFORM parameter does not include _xl
so there's an issue somewhere with the code which creates the rerun command.
@lumpfish Yes it's something that needs fixing - Andrew and I have seen that elsewhere (in fact we hit it on the "how to investigate build/test failures" call this morning. I think I mentioned it at least in passing to @smlambert somewhere, but can't remember if we created an issue on it (Clearly, if we didn't, that was an oversight and we need one). I had another one today that had a _mixed
suffix on the re-run link which also had to be removed.
@lumpfish Can you see if you can reproduce it again? If not, we'll have to put this into the "random network glitch" bucket (I need a label for that...)
I just reran the test and it passed. Did you fix anything? This issue was raised on 2nd March, but my chase up was 14th April (https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1992#issuecomment-818580435), so if it's a 'glitch' it's a recurring one.
Since March 2, the playbooks have been run on all linux machines, as per https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1990. This may have fixed it
This may have fixed it
Not if the error has been seen in the last week as suggested by https://github.com/AdoptOpenJDK/openjdk-tests/issues/2360 (looks like the same failure).
I've queued up 100 of @lumpfish's test case at https://ci.adoptopenjdk.net/job/Grinder/92/consoleFull (Takes about 3 minutes per iteration so will be churning for a while) - we can see if it shows as intermittent. Tagging https://github.com/AdoptOpenJDK/openjdk-build/issues/1450 as it seems likely this is an intermittent failure.
[EDIT: Yes it is showing up on that run]
@lumpfish Using PLATFORM=s390x_linux instead of s390x_linux_xl, I can get this test to pass on all Marist machines. An example on build-marist-rhel77-s390x-2
@Haroon-Khel Can you compare my grinder with yours - it seems to be failing on every iteration on my run
I used Simon's comment from 2 days ago to recreate the failing test, while you used the link in the first comment. I think the latter may be a better way to recreate the failing tests seeing as all of mine passed. The differences between the two are yours uses openj9 jdk 16 while mine uses hotspot jdk11, and yours runs the target jdk_other_1
while mine runs the custom target test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java
I dont think the cause is due to an error in /etc/hosts
as that file on test-marist-ubuntu1604-s390x-2X, the machine youre running your tests on, doesnt look corrupted. I suspect a firewall issue. Is there evidence of this test passing on marist machines in the past?
The reason we're getting different results is that if the test is resubmitted via the jdk_custom
url it doesn't actually run the requested test (note the /home/jenkins/workspace/Grinder/openjdk-tests/TKG/../openjdk/openjdk-jdk/test/jdk/java/math/BigInteger/BigIntegerTest.java
in the test output below).
16:39:34 ===============================================
16:39:34 Running test jdk_custom_1 ...
16:39:34 ===============================================
16:39:34 jdk_custom_1 Start Time: Thu Apr 15 15:39:33 2021 Epoch Time (ms): 1618501173425
16:39:34 "/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:destroyAll; "/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java" -Xshareclasses:groupAccess,destroyAll; echo "cache cleanup done";
16:39:34
16:39:34 Attempting to destroy all caches in cacheDir /home/jenkins/javasharedresources/
16:39:34
16:39:34 JVMSHRC806I Compressed references persistent shared cache "sharedcc_jenkins" has been destroyed. Use option -Xnocompressedrefs if you want to destroy a non-compressed references cache.
16:39:34 JVMSHRC807I Non-compressed references persistent shared cache "sharedcc_jenkins" has been destroyed. Use option -Xcompressedrefs if you want to destroy a compressed references cache.
16:39:34 JVMSHRC005I No shared class caches available
16:39:34 cache cleanup done
16:39:34 variation: Mode650
16:39:34 JVM_OPTIONS: -XX:-UseCompressedOops
16:39:34 { itercnt=1; \
16:39:34 mkdir -p "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16185011733385/jdk_custom_1"; \
16:39:34 cd "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16185011733385/jdk_custom_1"; \
16:39:34 "/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java" -Xmx512m -jar "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/openjdk/jtreg/lib/jtreg.jar" \
16:39:34 -agentvm -a -ea -esa -v:fail,error,time,nopass -retain:fail,error,*.dmp,javacore.*,heapdump.*,*.trc -ignore:quiet -timeoutFactor:8 -xml:verify -concurrency:2 -nativepath:"/home/jenkins/workspace/Grinder/openjdkbinary/openjdk-test-image/jdk/jtreg/native" -vmoptions:"-Xmx512m -XX:-UseCompressedOops " \
16:39:34 -timeoutHandler:jtreg.openj9.CoreDumpTimeoutHandler -timeoutHandlerDir:"/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/lib/openj9jtregtimeouthandler.jar" \
16:39:34 -w ""/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16185011733385/jdk_custom_1"/work" \
16:39:34 -r "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/openjdk/report" \
16:39:34 -jdk:"/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image" \
16:39:34 "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../openjdk/openjdk-jdk/test/jdk/java/math/BigInteger/BigIntegerTest.java"; \
16:39:34 if [ $? -eq 0 ] ; then echo ""; echo "jdk_custom_1""_PASSED"; echo ""; cd /home/jenkins/workspace/Grinder/openjdk-tests/TKG/..; else echo ""; echo "jdk_custom_1""_FAILED"; echo ""; fi; } 2>&1 | tee -a "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16185011733385/TestTargetResult";
16:39:35 Directory "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../TKG/output_16185011733385/jdk_custom_1/work" not found: creating
16:39:35 Directory "/home/jenkins/workspace/Grinder/openjdk-tests/TKG/../../jvmtest/openjdk/report" not found: creating
16:39:37 XML output with verification to /home/jenkins/workspace/Grinder/openjdk-tests/TKG/output_16185011733385/jdk_custom_1/work
16:40:16 Test results: passed: 1
16:40:19 Report written to /home/jenkins/workspace/Grinder/jvmtest/openjdk/report/html/report.html
16:40:19 Results written to /home/jenkins/workspace/Grinder/openjdk-tests/TKG/output_16185011733385/jdk_custom_1/work
16:40:19
16:40:19 jdk_custom_1_PASSED
The problem lies with the generated link again. It inserts TARGET=jdk_custom_1
, but if that is specified the wrong test is executed.
Running the test with
https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=11&JDK_IMPL=hotspot&JDK_VENDOR=adoptopenjdk&BUILD_LIST=openjdk&PLATFORM=s390x_linux&CUSTOM_TARGET=test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java&TARGET=jdk_custom
works (that is, the test fails).
The rerun link simply captures all params of the last run. It is on the user to then adjust the parameters accordingly. It exists as a convenience to help prepopulate params. I can remove it if its causing confusion (you would use Jenkins Rebuild link in the same way).
"The rerun link simply captures all params of the last run."
By 'last run' do you mean the parameters the job was submitted with? The original test was run with TARGET=jdk_custom.
but the generated link contains TARGET=jdk_custom_1
, which is the 'variation' target of the failing test. (I am surmising that the CUSTOM_TARGET
parameter is not recognised if the TARGET
is not jdk_custom
.
Raised https://github.com/AdoptOpenJDK/openjdk-tests/issues/2527 for the issue with the incorrect test being run.
The test code can be found at https://github.com/adoptium/jdk11u/blob/master/test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java
As far as I can tell, the test tries to connect to a mock/non existent dns server at 10.0.0.0:9 intentionally and waits for the request to timeout. On a passing test, expected timeout is 7750ms.
Testing on build-marist-rhel77-s390x-1, the test will pass randomly, and then return to failing with
javax.naming.CommunicationException: DNS error [Root exception is java.net.NoRouteToHostException: No route to host]; remaining name '' at jdk.naming.dns/com.sun.jndi.dns.DnsClient.query(DnsClient.java:316) at jdk.naming.dns/com.sun.jndi.dns.Resolver.query(Resolver.java:81)
Sometimes the test will fail because the request times out too quickly
@Haroon-Khel @smlambert Will this need some timeout parameter changes on those tests?
Running with 20 iterations at https://ci.adoptopenjdk.net/job/Grinder/1461/
Hmmm those are giving me Error: Cannot find file: /home/jenkins/workspace/Grinder/aqa-tests/TKG/../openjdk/openjdk-jdk/test/jdk/test/jdk/com/sun/jndi/dns/ConfigTests/Timeout.java
Finding this same type of failure on the new Marist systems being setup at OpenJ9, but only on the Redhat 7 systems. Ubuntu 20 installations work as expected. The older Redhat 7 systems from Marist do not have this No route to host
failure.
https://github.com/eclipse-openj9/openj9/issues/15998
Rerunning this for the sake of fresh grinders
I ran those grinders using https://github.com/adoptium/infrastructure/issues/1992#issuecomment-820560087 which look to be the correct instructions as they failed for Simon. They all pass now
s390x_linux extended.openjdk target jdk_other fails with
Example failing job: https://ci.adoptopenjdk.net/job/Test_openjdk16_j9_extended.openjdk_s390x_linux/9/consoleFull
Looks like it occurs on both hotspot and openj9, all releases and all s390x machines - so I'm guessing it's a machine setup issue on the marist machines.
To rerun the test: https://ci.adoptopenjdk.net/job/Grinder/parambuild/?JDK_VERSION=16&JDK_IMPL=openj9&JDK_VENDOR=adoptopenjdk&BUILD_LIST=openjdk&PLATFORM=s390x_linux_mixed&TARGET=jdk_other_1
Full output of failing test: