adoptium / aqa-tests

Home of test infrastructure for Adoptium builds
https://adoptium.net/aqavit
Apache License 2.0
129 stars 308 forks source link

jdk_net_0 test failing on AIX during jdk 18+36 release #3499

Open Haroon-Khel opened 2 years ago

Haroon-Khel commented 2 years ago

Failing jdk_net test from the extended openjdk test suite are failing on aix due to a timeout error:

com/sun/net/httpserver/simpleserver/StressDirListings.java.StressDirListings

20:04:16  TEST RESULT: Error. Program `/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 960490ms).

Test job https://ci.adoptopenjdk.net/job/Test_openjdk18_hs_extended.openjdk_ppc64_aix_testList_0/24/testReport/ Trss link https://trss.adoptium.net/output/test?id=623b9dc672761a4c5cf8bf34

Fails in the same way in 2 reruns https://ci.adoptopenjdk.net/job/Grinder/3950/console and https://ci.adoptopenjdk.net/job/Grinder/3949/console

smlambert commented 2 years ago

StressDirListings.jtr.txt

smlambert commented 2 years ago

on the off-chance that this test is failing on only certain machines and based on this comment which states that a couple of machines have better ipv6 config https://github.com/adoptium/infrastructure/issues/2456#issuecomment-1036164741 I am rerunning java_net target onbuild-osuosl-aix71-ppc64-1 in https://ci.adoptopenjdk.net/job/Grinder/3987/

Haroon-Khel commented 2 years ago

I'm rerunning jdk_net with jdk17 to see if com/sun/net/httpserver/simpleserver/StressDirListings.java.StressDirListings is a regression https://ci.adoptopenjdk.net/job/Grinder/4146/console

Haroon-Khel commented 2 years ago

From the jdk17 run alot of the same infra related tests did fail.

java/net/DatagramSocket/DatagramSocketExample.java.DatagramSocketExample
java/net/DatagramSocket/DatagramSocketMulticasting.java.DatagramSocketMulticasting
java/net/DatagramSocket/SendReceiveMaxSize.java.SendReceiveMaxSize
java/net/MulticastSocket/B6427403.java.B6427403
java/net/MulticastSocket/JoinLeave.java.JoinLeave
java/net/MulticastSocket/MulticastAddresses.java.MulticastAddresses
java/net/MulticastSocket/NoLoopbackPackets.java.NoLoopbackPackets
java/net/MulticastSocket/Promiscuous.java.Promiscuous
java/net/MulticastSocket/SetLoopbackMode.java.SetLoopbackMode
java/net/MulticastSocket/SetOutgoingIf.java.SetOutgoingIf
java/net/MulticastSocket/Test.java.Test

While com/sun/net/httpserver/simpleserver/StressDirListings.java.StressDirListings does not appear as a failure, I can't find it in any of the tests that ran.

In fact I don't think this test appears in the jdk17 source code https://github.com/adoptium/jdk17u/tree/master/test/jdk/com/sun/net/httpserver (unless I am looking in the wrong place).

Haroon-Khel commented 2 years ago

I was able to get com/sun/net/httpserver/simpleserver/StressDirListings.java.StressDirListings to pass after running it with -Dtest.timeout.factor=3.0, it's usually set to 1.0

===============================================
com/sun/net/httpserver/simpleserver/StressDirListings.java
Total tests run: 1, Passes: 1, Failures: 0, Skips: 0
===============================================

The timeout occurs because this for loop runs for 11000 iterations, due to the TIMES variable being set to 11000

On our AIX machines the test times out before 600 iterations. I do not recommend excluding this test because we know it passes.

@ShelleyLambert I recall it being said that upstream do not test on AIX, so it might be the case that this timeout is a general issue for AIX, not just our machines. What's the procedure for suggesting a fix? Something along the lines of setting TIMES to 500 for AIX?

Either way, this test case is not blocking since it passes under the right circumstances

smlambert commented 2 years ago

Thanks @Haroon-Khel (please use @smlambert - the other github account was attached to my old employer's email address and I do not see notifications for it).

I think we could change the options to the test harness on our side to increase the JTREG_TIMEOUT_OPTION for that platform (as I see we already do something similar for zOS, https://github.com/adoptium/aqa-tests/blob/master/openjdk/openjdk.mk#L82-L86).

Haroon-Khel commented 2 years ago

Looking at https://ci.adoptopenjdk.net/job/Grinder/4145/consoleFull (if you search for (com/sun/net/httpserver/simpleserver/StressDirListings.java) it shows that the timeout factor was 8, due to https://github.com/adoptium/aqa-tests/blob/master/openjdk/openjdk.mk#L82-L86, yet the test still managed to fail

15:22:49      /home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java \
15:22:49          -Dtest.vm.opts='-ea -esa -Xmx512m -XX:-UseCompressedOops' \
15:22:49          -Dtest.tool.vm.opts='-J-ea -J-esa -J-Xmx512m -J-XX:-UseCompressedOops' \
15:22:49          -Dtest.compiler.opts= \
15:22:49          -Dtest.java.opts= \
15:22:49          -Dtest.jdk=/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image \
15:22:49          -Dcompile.jdk=/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image \
15:22:49          -Dtest.timeout.factor=8.0 \
15:22:49          -Dtest.root=/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/jdk \
15:22:49          -Dtest.name=com/sun/net/httpserver/simpleserver/StressDirListings.java \
15:22:49          -Dtest.file=/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/jdk/com/sun/net/httpserver/simpleserver/StressDirListings.java \
15:22:49          -Dtest.src=/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/jdk/com/sun/net/httpserver/simpleserver \
15:22:49          -Dtest.src.path=/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/jdk/com/sun/net/httpserver/simpleserver:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/lib \
15:22:49          -Dtest.classes=/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16484661431492/jdk_net_1/work/classes/com/sun/net/httpserver/simpleserver/StressDirListings.d \
15:22:49          -Dtest.class.path=/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16484661431492/jdk_net_1/work/classes/com/sun/net/httpserver/simpleserver/StressDirListings.d:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16484661431492/jdk_net_1/work/classes/test/lib \
15:22:49          -Dtest.class.path.prefix=/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16484661431492/jdk_net_1/work/classes/com/sun/net/httpserver/simpleserver/StressDirListings.d:/home/jenkins/workspace/Grinder/aqa-tests/openjdk/openjdk-jdk/test/jdk/com/sun/net/httpserver/simpleserver:/home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16484661431492/jdk_net_1/work/classes/test/lib \
15:22:49          -Dtest.modules='jdk.httpserver java.logging' \
15:22:49          --add-modules jdk.httpserver,java.logging \
15:22:49          -ea \
15:22:49          -esa \
15:22:49          -Xmx512m \
15:22:49          -XX:-UseCompressedOops \
15:22:49          com.sun.javatest.regtest.agent.MainWrapper /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_16484661431492/jdk_net_1/work/com/sun/net/httpserver/simpleserver/StressDirListings.d/testng.0.jta com/sun/net/httpserver/simpleserver/StressDirListings.java false StressDirListings
15:22:49  
15:22:49  TEST RESULT: Error. Program `/home/jenkins/workspace/Grinder/openjdkbinary/j2sdk-image/bin/java' timed out (timeout set to 960000ms, elapsed time including timeout handling was 960519ms).

Yet I managed to get the test to run to completion with a timeout factor of 3

smlambert commented 2 years ago

Yet I managed to get the test to run to completion with a timeout factor of 3

@Haroon-Khel - I presume on the same machine? and if so, then the thing that is causing the delay takes varying amount of time, are there other things happening on the machine that would affect execution time?