Open Haroon-Khel opened 1 year ago
jdk_util, jdk_jfr failures seen in Jan 2024 release too (see notes here)
I believe the perf
suites are also in this category and should be understood/mitigated so the CI is not dependent upon my ODROID systems.
https://ci.adoptium.net/job/Grinder/9819/tapResults/ test-docker-ubuntu2004-armv7l-3 https://ci.adoptium.net/job/Grinder/9820/tapResults/ test-docker-ubuntu2004-armv7l-2 https://ci.adoptium.net/job/Grinder/9821/tapResults/ test-docker-ubuntu2004-armv7l-6 https://ci.adoptium.net/job/Grinder/9822/tapResults/ test-docker-ubuntu2004-armv7l-5 https://ci.adoptium.net/job/Grinder/9823/tapResults/ test-docker-ubuntu2004-armv7l-4 https://ci.adoptium.net/job/Grinder/9824/tapResults/ test-docker-ubuntu2004-armv7l-1
Looks like jdk_other_2 jdk_security3_2 and jdk_instrument_2 pass on some machines and fail on others. Could be intermittent, im rerunning these tests on the same machines to confirm this. The jdk_net_2 jdk_util_2 and jdk_jfr_2 consistently fail.
The jfr failures are mostly SIGBUS errors
[thread -754977696 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0xf63a91a8, pid=88505, tid=0xd34e8460
#
# JRE version: OpenJDK Runtime Environment (8.0_412-b08) (build 1.8.0_412-b08)
# Java VM: OpenJDK Client VM (25.412-b08 mixed mode linux-aarch32 )
# Problematic frame:
# V [libjvm.so+0x33b1a8] write_checkpoint_header(unsigned char*, long long, long long, bool, unsigned int)+0xe8
#
# Core dump written. Default location: /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17144060076039/jdk_jfr_2/work/scratch/0/core or core.88505
#
# An error report file with more information is saved as:
# /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17144060076039/jdk_jfr_2/work/scratch/0/hs_err_pid88505.log
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
#
java/net/Inet6Address/B6206527.java.B6206527 error log
trying LL addr: /fe80:0:0:0:42:acff:fe11:3%eth0
trying LL addr: /fe80:0:0:0:42:acff:fe11:3
java.net.BindException: Cannot assign requested address (Bind failed)
at java.net.PlainSocketImpl.socketBind(Native Method)
at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
at java.net.ServerSocket.bind(ServerSocket.java:390)
at java.net.ServerSocket.bind(ServerSocket.java:344)
at B6206527.main(B6206527.java:57)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
at java.lang.Thread.run(Thread.java:750)
JavaTest Message: Test threw exception: java.net.BindException
JavaTest Message: shutting down test
java/net/ipv6tests/B6521014.java.B6521014
java.net.ConnectException: Network is unreachable (connect failed)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at B6521014.test1(B6521014.java:77)
at B6521014.main(B6521014.java:106)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
at java.lang.Thread.run(Thread.java:750)
JavaTest Message: Test threw exception: java.net.ConnectException
JavaTest Message: shutting down test
Added an arm32 debian static docker container to the inventory https://ci.adoptium.net/computer/test-docker-debian12-armv7l-1/, rerunning the failed tests on it https://ci.adoptium.net/job/Grinder/9835/console
Looking at grinders 9828 to 9833, jdk_other_2 jdk_security3_2 and jdk_instrument_2 fail intermittently.
Of jdk_security3_2's failing tests, alot are unexpected exits from what looks like a passing test, https://ci.adoptium.net/job/Grinder/9828/tapResults/ for example
Failed test cases:
TEST: sun/security/ssl/ClientHandshaker/CipherSuiteOrder.java
TEST: sun/security/ssl/SSLSocketImpl/RejectClientRenego.java
Test results: passed: 614; failed: 2
sun/security/ssl/ClientHandshaker/CipherSuiteOrder.java
Unexpected exit from test [exit code: 134]
Standard Output
server enabled suites:
=====================
client enabled suites:
======================
SSL_RSA_WITH_DES_CBC_SHA
SSL_RSA_WITH_RC4_128_MD5
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
SSL_RSA_WITH_RC4_128_SHA
SSL_DHE_DSS_WITH_DES_CBC_SHA
SSL_DHE_DSS_WITH_DES_CBC_SHA
SSL_RSA_WITH_RC4_128_MD5
Server read: 80
Cipher suite in use: SSL_RSA_WITH_RC4_128_MD5
client read: 85
Standard Error
STATUS:Passed.
sun/security/ssl/SSLSocketImpl/RejectClientRenego.java
Unexpected exit from test [exit code: 133]
Standard Output
Session: Session(1714476936531|TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA)
Seen handshake completed #1
sending/receiving data, iteration: 0
starting new handshake
Got the expected exception
Got the expected exception
Standard Error
STATUS:Passed.
As part of the work we're having to do for Ubuntu 24.04 support it would be useful to test whether an Ubuntu 24.04 at OSUOSL can run 32-bit containers without the same problems.
Got a ubuntu 2404 arm32 container, https://ci.adoptium.net/computer/test-docker-ubuntu2404-armv7-1/, running on a ubuntu 2404 OSUOSL arm64 dockerhost machine https://ci.adoptium.net/computer/dockerhost-osuosl-ubuntu2404-aarch64-1/ (used to be dockerhost-osuosl-ubuntu2204-aarch64-1)
Failures
sanity openjdk
sun/security/krb5/auto/rcache_usemd5.sh
extended openjdk
jdk_beans_2 java/net/Inet6Address/B6206527.java java/net/ipv6tests/B6521014.java sun/security/ssl/SSLSocketImpl/ServerTimeout.java jdk_jfr_2
extended perf
dacapo-xalan_0 (only one extended perf test failure. Perhaps their failures on containerised arm32 machines is intermittent?)
sanity functional, special functional and extended functional all failed. Rerunning
sanity special and extended (all functional) are failing to build due to this error
13:16:30 [javac] Compiling 1 source file to /home/jenkins/workspace/Test_openjdk8_hs_special.functional_arm_linux/aqa-tests/functional/MockitoTests/bin
13:16:31 [javac] /home/jenkins/workspace/Test_openjdk8_hs_special.functional_arm_linux/aqa-tests/functional/MockitoTests/src/test/java/MockitoMockTest.java:17: error: cannot access Mockito
13:16:31 [javac] import org.mockito.Mockito;
13:16:31 [javac] ^
13:16:31 [javac] bad class file: /home/jenkins/testDependency/lib/mockito-core.jar(org/mockito/Mockito.class)
13:16:31 [javac] class file has wrong version 55.0, should be 52.0
13:16:31 [javac] Please remove or make sure it appears in the correct subdirectory of the classpath.
The node uses jdk17 for its jenkins agent while these are jdk8 tests, that might have something to do with it
No problem building jdk11 sanity functional tests https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.functional_arm_linux/420/console
Switched the jdk on the node to jdk11, restarted the node. Rebuild of sanity special and extended (all functional) https://ci.adoptium.net/job/AQA_Test_Pipeline/261/console
re https://github.com/adoptium/infrastructure/issues/3043#issuecomment-2115123748, class file has wrong version 55.0, should be 52.0
means mismatch java compiler. (see https://stackoverflow.com/questions/60612488/error-class-file-has-wrong-version-55-0-should-be-52-0-when-building-alfresco)
That being said, MockitoMockTest is set JDK11+ in playlist.xml AQA repo atm.
Two things need to be done:
DYNAMIC_COMPILE=true
Rerunning the non intermittent failing tests jdk_net,jdk_util,jdk_jfr on the newly created test-osuosl-ubuntu2404-aarch64-1
Interesting, only the following jdk8 jdk_net tests fail on test-osuosl-ubuntu2404-aarch64-1 (arm64 not arm32)
TEST: sun/net/www/http/HttpClient/KeepAliveTest.java
TEST: sun/net/www/http/KeepAliveCache/B8291637.java
TEST: sun/net/www/http/KeepAliveCache/KeepAliveProperty.java
TEST: sun/net/www/http/KeepAliveCache/B8293562.java
The jdk_util jdk_jfr tests pass
I've kicked off the sanity run on the U2404/arm32 box with the v1.0.1-release branch to see if the build failure is specific to something in the master branch. It's not immediately obvious why this would be specific to arm32 machines though.
jdk8 jdk_util tests, which consistently fail on the static docker arm32 nodes, pass on test-docker-ubuntu2404-armv7-1
https://ci.adoptium.net/job/Grinder/10156/tapResults/
We're also not seeing the same ipv6 jdk_net failures that we see in https://github.com/adoptium/infrastructure/issues/3043#issuecomment-2085016314
I believe the perf suites are also in this category and should be understood/mitigated so the CI is not dependent upon my ODROID systems.
@sxa Which were the failing perf tests again? https://ci.adoptium.net/job/AQA_Test_Pipeline/280/console (jdk8 v1.0.1-release branch on est-docker-ubuntu2404-armv7-1) finished running. Sanity perf and extended perf both passed
https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.perf_arm_linux/475/ https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/137/
@sxa Which were the failing perf tests again? https://ci.adoptium.net/job/AQA_Test_Pipeline/280/console (jdk8 v1.0.1-release branch on est-docker-ubuntu2404-armv7-1) finished running. Sanity perf and extended perf both passed
Can't remember which versions, but we should perhaps try running those on the Equinix containers and see if they pass there
I kicked off JDK8 11 17 sanity and extended perf tests on the static docker arm32 nodes but I think because I kicked too many at once, the earlier test jobs did not get saved, leaving the earlier AQA pipelines looking like this https://ci.adoptium.net/job/AQA_Test_Pipeline/316/console
[Pipeline] }
Failed in branch Test_openjdk17_hs_extended.perf_arm_linux_6
[Pipeline] }
Failed in branch Test_openjdk11_hs_extended.perf_arm_linux_4
[Pipeline] }
Failed in branch Test_openjdk8_hs_sanity.perf_arm_linux_1
[Pipeline] }
Failed in branch Test_openjdk17_hs_sanity.perf_arm_linux_5
[Pipeline] }
Failed in branch Test_openjdk8_hs_extended.perf_arm_linux_2
[Pipeline] }
Failed in branch Test_openjdk11_hs_sanity.perf_arm_linux_3
[Pipeline] // parallel
[Pipeline] End of Pipeline
But if you look at the last 5 jobs (the only ones available) in https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk11_hs_extended.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk17_hs_sanity.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk17_hs_extended.perf_arm_linux/
We are seeing them pass on static docker containers, which at the very least reduces our dependency on the odroid machines. https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/ has the lowest pass rate, so some further investigation is required there
Among the failing jdk8 extended perf tests, dacapo-xalan_0 fails consistently while renaissance-finagle-http_0 fails intermittently
Rerunning both tests on all arm32 static docker nodes for 10 iterations test-docker-debian12-armv7l-1 https://ci.adoptium.net/job/Grinder/10475/console Both tests passed 1/10 times. The only pass for both tests occurred in the same iteration
test-docker-ubuntu2004-armv7l-5 https://ci.adoptium.net/job/Grinder/10476/console dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 10/10 times
test-docker-ubuntu2004-armv7l-4 https://ci.adoptium.net/job/Grinder/10477/console dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 9/10 times
test-docker-ubuntu2004-armv7l-2 https://ci.adoptium.net/job/Grinder/10478/console dacapo-xalan_0 failed 10/10 times, renaissance-finagle-http_0 passed 2/10 times
test-docker-ubuntu2004-armv7l-3 https://ci.adoptium.net/job/Grinder/10479/console dacapo-xalan_0 failed 10/10 times, renaissance-finagle-http_0 passed 1/10 times
test-docker-ubuntu2004-armv7l-1 https://ci.adoptium.net/job/Grinder/10480/console dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 1/10 times
test-docker-ubuntu2004-armv7l-6 https://ci.adoptium.net/job/Grinder/10481/console dacapo-xalan_0 passed 10/10 times, renaissance-finagle-http_0 passed 9/10 times
test-docker-ubuntu2404-armv7-1 https://ci.adoptium.net/job/Grinder/10482/console Both tests failed 1/10 times
Maybe also test with a JDK11 using the jdk8u material (or see if there is an equivalent test in the jdk11u repo) Also noting that the dacapo_xalan benchmark test can be temperamental on other environments. There is a newer version of the tests which we may also be able to try.
I dont think this is a complete list, but just an observed list of failures from the recent April release. https://github.com/adoptium/aqa-tests/issues/4518#issuecomment-1525349302
jdk_instrument_2, jdk_security3_2, jdk_other_2:
javax/xml/jaxp/common/8144593/TransformationWarningsTest.java.TransformationWarningsTest javax/net/ssl/ALPN/SSLServerSocketAlpnTest.java.SSLServerSocketAlpnTest javax/net/ssl/ALPN/SSLSocketAlpnTest.java.SSLSocketAlpnTest javax/net/ssl/sanity/interop/ClientJSSEServerJSSE.java.ClientJSSEServerJSSE sun/security/ssl/GenSSLConfigs/main.java.main javax/xml/jaxp/common/8144593/ValidationWarningsTest.java.ValidationWarningsTest
jdk_net_2:
com/sun/net/httpserver/Test9.java.Test9 com/sun/net/httpserver/bugs/B6361557.java.B6361557 java/net/ipv6tests/TcpTest.java.TcpTest
jdk_util_2:
java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.java.CancelledProducerConsumerLoops java/util/concurrent/ConcurrentQueues/ConcurrentQueueLoops.java.ConcurrentQueueLoops java/util/concurrent/ExecutorCompletionService/ExecutorCompletionServiceLoops.java.ExecutorCompletionServiceLoops java/util/stream/boottest/java/util/stream/NodeTest.java.NodeTest java/util/stream/test/org/openjdk/tests/java/util/stream/RangeTest.java.RangeTest java/util/Properties/ConcurrentLoadAndStoreXML.java.ConcurrentLoadAndStoreXML java/util/stream/boottest/java/util/stream/DoubleNodeTest.java.DoubleNodeTest java/util/stream/boottest/java/util/stream/IntNodeTest.java.IntNodeTest java/util/stream/boottest/java/util/stream/FlagOpTest.java.FlagOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/FilterOpTest.java.FilterOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/InfiniteStreamWithLimitOpTest.java.InfiniteStreamWithLimitOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/IntSliceOpTest.java.IntSliceOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/IntUniqOpTest.java.IntUniqOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/SequentialOpTest.java.SequentialOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/StreamBuilderTest.java.StreamBuilderTest
jdk_jfr_2:
~300 failing tests
All of these tests pass on the odroid machines, test-sxa-armv7l-ubuntu2004-odroid-1 and 2 which are not containerised environments