adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

System unavailable: Xvfb seems nonresponsive on test-osuosl-ubuntu1804-ppc64le-2 #3171

Closed adamfarley closed 2 weeks ago

adamfarley commented 1 year ago
Exception: java.lang.IllegalStateException: No display name received from Xvfb within 30 seconds

This has been happening for while on this platform, though I cannot prove that it was definitely this machine.

adamfarley commented 10 months ago

Seen twice more: test-osuosl-ubuntu2004-ppc64le-1: https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.system_ppc64le_linux/826/console test-docker-ubuntu1804-ppc64le-1: https://ci.adoptium.net/job/Test_openjdk11_hs_extended.openjdk_ppc64le_linux_testList_0/118/console

adamfarley commented 1 month ago

Likely a duplicate of https://github.com/adoptium/aqa-tests/issues/4930

smlambert commented 2 weeks ago

On test-docker-ubuntu2204-ppc64le-3 https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_ppc64le_linux_testList_3/9/

On test-docker-ubuntu2004-ppc64le-1 https://ci.adoptium.net/job/Test_openjdk21_hs_extended.openjdk_ppc64le_linux_testList_0/20/

steelhead31 commented 2 weeks ago

A good number of these issues are on a single dockerhost, dockerhost-osuosl-ubuntu2004-ppc64le-1  , this host has 3 containers test-docker-fedora39-ppc64le-1 , test-docker-ubuntu2004-ppc64le-1 & test-docker-ubuntu2204-ppc64le-3

steelhead31 commented 2 weeks ago

I'll test by stopping a single container ( test-docker-ubuntu2004-ppc64le-1 ), to see if performance is improved.

Rerun in grinder on test-docker-ubuntu2204-ppc64le-3 with above container & node stopped https://ci.adoptium.net/job/Grinder/10788/

steelhead31 commented 2 weeks ago

FYI: @smlambert & @adamfarley I've identified and corrected a permissions issue on test-docker-ubuntu2204-ppc64le-3 ( /tmp/.X11-unix was owned by Jenkins and it must be owned by root )..  new grinder run is here...  https://ci.adoptium.net/job/Grinder/10790/ which seems to be proceeding correctly. Once this run is finished, I'll test it on the other docker containers.

steelhead31 commented 2 weeks ago

Running same tests on test-docker-ubuntu2004-ppc64le-1 here : https://ci.adoptium.net/job/Grinder/10793/console  - Passed

steelhead31 commented 2 weeks ago

And passed on the fedora container here. Im going to close this, please let me know if this recurrs and what machine, as I believe the underlying issue has been resolved.