adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
84 stars 100 forks source link

Tests failing in containerised arm32 environments JDK8 #3043

Open Haroon-Khel opened 1 year ago

Haroon-Khel commented 1 year ago

I dont think this is a complete list, but just an observed list of failures from the recent April release. https://github.com/adoptium/aqa-tests/issues/4518#issuecomment-1525349302

jdk_instrument_2, jdk_security3_2, jdk_other_2:

javax/xml/jaxp/common/8144593/TransformationWarningsTest.java.TransformationWarningsTest javax/net/ssl/ALPN/SSLServerSocketAlpnTest.java.SSLServerSocketAlpnTest javax/net/ssl/ALPN/SSLSocketAlpnTest.java.SSLSocketAlpnTest javax/net/ssl/sanity/interop/ClientJSSEServerJSSE.java.ClientJSSEServerJSSE sun/security/ssl/GenSSLConfigs/main.java.main javax/xml/jaxp/common/8144593/ValidationWarningsTest.java.ValidationWarningsTest

jdk_net_2:

com/sun/net/httpserver/Test9.java.Test9 com/sun/net/httpserver/bugs/B6361557.java.B6361557 java/net/ipv6tests/TcpTest.java.TcpTest

jdk_util_2:

java/util/concurrent/BlockingQueue/CancelledProducerConsumerLoops.java.CancelledProducerConsumerLoops java/util/concurrent/ConcurrentQueues/ConcurrentQueueLoops.java.ConcurrentQueueLoops java/util/concurrent/ExecutorCompletionService/ExecutorCompletionServiceLoops.java.ExecutorCompletionServiceLoops java/util/stream/boottest/java/util/stream/NodeTest.java.NodeTest java/util/stream/test/org/openjdk/tests/java/util/stream/RangeTest.java.RangeTest java/util/Properties/ConcurrentLoadAndStoreXML.java.ConcurrentLoadAndStoreXML java/util/stream/boottest/java/util/stream/DoubleNodeTest.java.DoubleNodeTest java/util/stream/boottest/java/util/stream/IntNodeTest.java.IntNodeTest java/util/stream/boottest/java/util/stream/FlagOpTest.java.FlagOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/FilterOpTest.java.FilterOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/InfiniteStreamWithLimitOpTest.java.InfiniteStreamWithLimitOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/IntSliceOpTest.java.IntSliceOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/IntUniqOpTest.java.IntUniqOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/SequentialOpTest.java.SequentialOpTest java/util/stream/test/org/openjdk/tests/java/util/stream/StreamBuilderTest.java.StreamBuilderTest

jdk_jfr_2:

~300 failing tests

All of these tests pass on the odroid machines, test-sxa-armv7l-ubuntu2004-odroid-1 and 2 which are not containerised environments

smlambert commented 5 months ago

jdk_util, jdk_jfr failures seen in Jan 2024 release too (see notes here)

sxa commented 4 months ago

I believe the perf suites are also in this category and should be understood/mitigated so the CI is not dependent upon my ODROID systems.

Haroon-Khel commented 2 months ago

https://ci.adoptium.net/job/Grinder/9819/tapResults/ test-docker-ubuntu2004-armv7l-3 https://ci.adoptium.net/job/Grinder/9820/tapResults/ test-docker-ubuntu2004-armv7l-2 https://ci.adoptium.net/job/Grinder/9821/tapResults/ test-docker-ubuntu2004-armv7l-6 https://ci.adoptium.net/job/Grinder/9822/tapResults/ test-docker-ubuntu2004-armv7l-5 https://ci.adoptium.net/job/Grinder/9823/tapResults/ test-docker-ubuntu2004-armv7l-4 https://ci.adoptium.net/job/Grinder/9824/tapResults/ test-docker-ubuntu2004-armv7l-1

Looks like jdk_other_2 jdk_security3_2 and jdk_instrument_2 pass on some machines and fail on others. Could be intermittent, im rerunning these tests on the same machines to confirm this. The jdk_net_2 jdk_util_2 and jdk_jfr_2 consistently fail.

The jfr failures are mostly SIGBUS errors

[thread -754977696 also had an error]
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0xf63a91a8, pid=88505, tid=0xd34e8460
#
# JRE version: OpenJDK Runtime Environment (8.0_412-b08) (build 1.8.0_412-b08)
# Java VM: OpenJDK Client VM (25.412-b08 mixed mode linux-aarch32 )
# Problematic frame:
# V  [libjvm.so+0x33b1a8]  write_checkpoint_header(unsigned char*, long long, long long, bool, unsigned int)+0xe8
#
# Core dump written. Default location: /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17144060076039/jdk_jfr_2/work/scratch/0/core or core.88505
#
# An error report file with more information is saved as:
# /home/jenkins/workspace/Grinder/aqa-tests/TKG/output_17144060076039/jdk_jfr_2/work/scratch/0/hs_err_pid88505.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

java/net/Inet6Address/B6206527.java.B6206527 error log

trying LL addr: /fe80:0:0:0:42:acff:fe11:3%eth0
trying LL addr: /fe80:0:0:0:42:acff:fe11:3
java.net.BindException: Cannot assign requested address (Bind failed)
    at java.net.PlainSocketImpl.socketBind(Native Method)
    at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)
    at java.net.ServerSocket.bind(ServerSocket.java:390)
    at java.net.ServerSocket.bind(ServerSocket.java:344)
    at B6206527.main(B6206527.java:57)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
    at java.lang.Thread.run(Thread.java:750)

JavaTest Message: Test threw exception: java.net.BindException
JavaTest Message: shutting down test

java/net/ipv6tests/B6521014.java.B6521014

java.net.ConnectException: Network is unreachable (connect failed)
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:607)
    at B6521014.test1(B6521014.java:77)
    at B6521014.main(B6521014.java:106)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:298)
    at java.lang.Thread.run(Thread.java:750)

JavaTest Message: Test threw exception: java.net.ConnectException
JavaTest Message: shutting down test
Haroon-Khel commented 2 months ago

Added an arm32 debian static docker container to the inventory https://ci.adoptium.net/computer/test-docker-debian12-armv7l-1/, rerunning the failed tests on it https://ci.adoptium.net/job/Grinder/9835/console

Haroon-Khel commented 2 months ago

Looking at grinders 9828 to 9833, jdk_other_2 jdk_security3_2 and jdk_instrument_2 fail intermittently.

Of jdk_security3_2's failing tests, alot are unexpected exits from what looks like a passing test, https://ci.adoptium.net/job/Grinder/9828/tapResults/ for example

Failed test cases: 
TEST: sun/security/ssl/ClientHandshaker/CipherSuiteOrder.java
TEST: sun/security/ssl/SSLSocketImpl/RejectClientRenego.java
Test results: passed: 614; failed: 2 

sun/security/ssl/ClientHandshaker/CipherSuiteOrder.java

Unexpected exit from test [exit code: 134]    
Standard Output
server enabled suites: 
=====================

client enabled suites: 
======================
SSL_RSA_WITH_DES_CBC_SHA
SSL_RSA_WITH_RC4_128_MD5
SSL_DHE_DSS_WITH_3DES_EDE_CBC_SHA
SSL_RSA_WITH_RC4_128_SHA
SSL_DHE_DSS_WITH_DES_CBC_SHA

SSL_DHE_DSS_WITH_DES_CBC_SHA
SSL_RSA_WITH_RC4_128_MD5

Server read: 80
Cipher suite in use: SSL_RSA_WITH_RC4_128_MD5
client read: 85

Standard Error
STATUS:Passed.

sun/security/ssl/SSLSocketImpl/RejectClientRenego.java

Unexpected exit from test [exit code: 133]    
Standard Output
Session: Session(1714476936531|TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA)
Seen handshake completed #1
sending/receiving data, iteration: 0
starting new handshake
Got the expected exception
Got the expected exception

Standard Error
STATUS:Passed.
sxa commented 1 month ago

As part of the work we're having to do for Ubuntu 24.04 support it would be useful to test whether an Ubuntu 24.04 at OSUOSL can run 32-bit containers without the same problems.

Haroon-Khel commented 1 month ago

Got a ubuntu 2404 arm32 container, https://ci.adoptium.net/computer/test-docker-ubuntu2404-armv7-1/, running on a ubuntu 2404 OSUOSL arm64 dockerhost machine https://ci.adoptium.net/computer/dockerhost-osuosl-ubuntu2404-aarch64-1/ (used to be dockerhost-osuosl-ubuntu2204-aarch64-1)

https://ci.adoptium.net/job/AQA_Test_Pipeline/258/console

Haroon-Khel commented 1 month ago

Failures

sanity openjdk

sun/security/krb5/auto/rcache_usemd5.sh

extended openjdk

jdk_beans_2 java/net/Inet6Address/B6206527.java java/net/ipv6tests/B6521014.java sun/security/ssl/SSLSocketImpl/ServerTimeout.java jdk_jfr_2

extended perf

dacapo-xalan_0 (only one extended perf test failure. Perhaps their failures on containerised arm32 machines is intermittent?)

sanity functional, special functional and extended functional all failed. Rerunning

https://ci.adoptium.net/job/AQA_Test_Pipeline/259/console

Haroon-Khel commented 1 month ago

sanity special and extended (all functional) are failing to build due to this error

13:16:30      [javac] Compiling 1 source file to /home/jenkins/workspace/Test_openjdk8_hs_special.functional_arm_linux/aqa-tests/functional/MockitoTests/bin
13:16:31      [javac] /home/jenkins/workspace/Test_openjdk8_hs_special.functional_arm_linux/aqa-tests/functional/MockitoTests/src/test/java/MockitoMockTest.java:17: error: cannot access Mockito
13:16:31      [javac] import org.mockito.Mockito;
13:16:31      [javac]                   ^
13:16:31      [javac]   bad class file: /home/jenkins/testDependency/lib/mockito-core.jar(org/mockito/Mockito.class)
13:16:31      [javac]     class file has wrong version 55.0, should be 52.0
13:16:31      [javac]     Please remove or make sure it appears in the correct subdirectory of the classpath.

The node uses jdk17 for its jenkins agent while these are jdk8 tests, that might have something to do with it

Haroon-Khel commented 1 month ago

No problem building jdk11 sanity functional tests https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.functional_arm_linux/420/console

Haroon-Khel commented 1 month ago

Switched the jdk on the node to jdk11, restarted the node. Rebuild of sanity special and extended (all functional) https://ci.adoptium.net/job/AQA_Test_Pipeline/261/console

llxia commented 1 month ago

re https://github.com/adoptium/infrastructure/issues/3043#issuecomment-2115123748, class file has wrong version 55.0, should be 52.0 means mismatch java compiler. (see https://stackoverflow.com/questions/60612488/error-class-file-has-wrong-version-55-0-should-be-52-0-when-building-alfresco)

That being said, MockitoMockTest is set JDK11+ in playlist.xml AQA repo atm.

Two things need to be done:

Haroon-Khel commented 1 month ago

Rerunning the non intermittent failing tests jdk_net,jdk_util,jdk_jfr on the newly created test-osuosl-ubuntu2404-aarch64-1

https://ci.adoptium.net/job/Grinder/10138/console

Haroon-Khel commented 1 month ago

Interesting, only the following jdk8 jdk_net tests fail on test-osuosl-ubuntu2404-aarch64-1 (arm64 not arm32)

TEST: sun/net/www/http/HttpClient/KeepAliveTest.java
TEST: sun/net/www/http/KeepAliveCache/B8291637.java
TEST: sun/net/www/http/KeepAliveCache/KeepAliveProperty.java
TEST: sun/net/www/http/KeepAliveCache/B8293562.java

The jdk_util jdk_jfr tests pass

sxa commented 3 weeks ago

I've kicked off the sanity run on the U2404/arm32 box with the v1.0.1-release branch to see if the build failure is specific to something in the master branch. It's not immediately obvious why this would be specific to arm32 machines though.

Haroon-Khel commented 3 weeks ago

jdk8 jdk_util tests, which consistently fail on the static docker arm32 nodes, pass on test-docker-ubuntu2404-armv7-1

https://ci.adoptium.net/job/Grinder/10156/tapResults/

We're also not seeing the same ipv6 jdk_net failures that we see in https://github.com/adoptium/infrastructure/issues/3043#issuecomment-2085016314

Haroon-Khel commented 2 weeks ago

I believe the perf suites are also in this category and should be understood/mitigated so the CI is not dependent upon my ODROID systems.

@sxa Which were the failing perf tests again? https://ci.adoptium.net/job/AQA_Test_Pipeline/280/console (jdk8 v1.0.1-release branch on est-docker-ubuntu2404-armv7-1) finished running. Sanity perf and extended perf both passed

https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.perf_arm_linux/475/ https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/137/

sxa commented 2 weeks ago

@sxa Which were the failing perf tests again? https://ci.adoptium.net/job/AQA_Test_Pipeline/280/console (jdk8 v1.0.1-release branch on est-docker-ubuntu2404-armv7-1) finished running. Sanity perf and extended perf both passed

Can't remember which versions, but we should perhaps try running those on the Equinix containers and see if they pass there

Haroon-Khel commented 3 days ago

I kicked off JDK8 11 17 sanity and extended perf tests on the static docker arm32 nodes but I think because I kicked too many at once, the earlier test jobs did not get saved, leaving the earlier AQA pipelines looking like this https://ci.adoptium.net/job/AQA_Test_Pipeline/316/console

[Pipeline] }
Failed in branch Test_openjdk17_hs_extended.perf_arm_linux_6
[Pipeline] }
Failed in branch Test_openjdk11_hs_extended.perf_arm_linux_4
[Pipeline] }
Failed in branch Test_openjdk8_hs_sanity.perf_arm_linux_1
[Pipeline] }
Failed in branch Test_openjdk17_hs_sanity.perf_arm_linux_5
[Pipeline] }
Failed in branch Test_openjdk8_hs_extended.perf_arm_linux_2
[Pipeline] }
Failed in branch Test_openjdk11_hs_sanity.perf_arm_linux_3
[Pipeline] // parallel
[Pipeline] End of Pipeline

But if you look at the last 5 jobs (the only ones available) in https://ci.adoptium.net/job/Test_openjdk8_hs_sanity.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk11_hs_extended.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk17_hs_sanity.perf_arm_linux/ https://ci.adoptium.net/job/Test_openjdk17_hs_extended.perf_arm_linux/

We are seeing them pass on static docker containers, which at the very least reduces our dependency on the odroid machines. https://ci.adoptium.net/job/Test_openjdk8_hs_extended.perf_arm_linux/ has the lowest pass rate, so some further investigation is required there

Among the failing jdk8 extended perf tests, dacapo-xalan_0 fails consistently while renaissance-finagle-http_0 fails intermittently

Rerunning both tests on all arm32 static docker nodes for 10 iterations test-docker-debian12-armv7l-1 https://ci.adoptium.net/job/Grinder/10475/console Both tests passed 1/10 times. The only pass for both tests occurred in the same iteration

test-docker-ubuntu2004-armv7l-5 https://ci.adoptium.net/job/Grinder/10476/console dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 10/10 times

test-docker-ubuntu2004-armv7l-4 https://ci.adoptium.net/job/Grinder/10477/console dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 9/10 times

test-docker-ubuntu2004-armv7l-2 https://ci.adoptium.net/job/Grinder/10478/console dacapo-xalan_0 failed 10/10 times, renaissance-finagle-http_0 passed 2/10 times

test-docker-ubuntu2004-armv7l-3 https://ci.adoptium.net/job/Grinder/10479/console dacapo-xalan_0 failed 10/10 times, renaissance-finagle-http_0 passed 1/10 times

test-docker-ubuntu2004-armv7l-1 https://ci.adoptium.net/job/Grinder/10480/console dacapo-xalan_0 passed 1/10 times, renaissance-finagle-http_0 passed 1/10 times

test-docker-ubuntu2004-armv7l-6 https://ci.adoptium.net/job/Grinder/10481/console dacapo-xalan_0 passed 10/10 times, renaissance-finagle-http_0 passed 9/10 times

test-docker-ubuntu2404-armv7-1 https://ci.adoptium.net/job/Grinder/10482/console Both tests failed 1/10 times