adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
86 stars 102 forks source link

sanity.external failures on xlinux on ibmcloud systems #3128

Open sxa opened 1 year ago

sxa commented 1 year ago

Please set the title to indicate the test name and machine name where known.

To make it easy for the infrastructure team to repeat and diagnose, please answer the following questions:

Any other details: may be related to the OS - Ubuntu 16.04 is obviously quite old. RHEL7 may have other limitations as the docker host used for these images. Since I'm not sure this has been previously reported it may be worth checking if the underlying images used for this have been updated to later versions of OSs that might be incompatible with newer containers.

sxa commented 3 weeks ago

Needs further investigation to determine if the older docker version on those distributions may be the cause of the problem as opposed to it being ibmcloud specific.

sxa commented 6 days ago

Ubuntu 16.04 may be beyond hope:

jenkins@test-ibmcloud-ubuntu1604-x64-1:~$ docker run -it docker.io/library/eclipse-temurin:11-jdk
[0.007s][warning][os,thread] Failed to start thread "GC Thread#0" - pthread_create failed (EPERM) for attributes: stacksize: 1024k, guardsize: 4k, detached.
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create worker GC thread. Out of system resources.
# An error report file with more information is saved as:
# //hs_err_pid1.log
jenkins@test-ibmcloud-ubuntu1604-x64-1:~$ 
sxa commented 6 days ago

On test-azure-ubuntu2404-x64-1 we get this log:

14:49:18       [exec] The test in the build_image() function is openliberty-mp-tck
14:49:18       [exec] #####################################################
14:49:18       [exec] INFO: sudo podman build  --no-cache -t adoptopenjdk-openliberty-mp-tck-test:11-jdk-ubuntu-hotspot-full -f /home/jenkins/workspace/Grinder/jvmtest/external/openliberty-mp-tck/dockerfile/11/jdk/ubuntu/Dockerfile.hotspot.full /home/jenkins/workspace/Grinder/jvmtest/external/
14:49:18       [exec] #####################################################
14:49:18       [exec] sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
14:49:18       [exec] sudo: a password is required
14:49:18  
14:49:18  BUILD FAILED

sudo podman, if it is being executed on the host system, is not going to work. This system also only has docker - not podman - available

sxa commented 6 days ago

Same issue is experienced on the rhel7 machine, which also has docker and not podman:

14:56:30       [exec] INFO: sudo podman build  --no-cache -t adoptopenjdk-openliberty-mp-tck-test:11-jdk-ubuntu-hotspot-full -f /home/jenkins/workspace/Grinder/jvmtest/external/openliberty-mp-tck/dockerfile/11/jdk/ubuntu/Dockerfile.hotspot.full /home/jenkins/workspace/Grinder/jvmtest/external/
14:56:30       [exec] #####################################################
14:56:30       [exec] 
14:56:30       [exec] We trust you have received the usual lecture from the local System
14:56:30       [exec] Administrator. It usually boils down to these three things:
14:56:30       [exec] 
14:56:30       [exec]     #1) Respect the privacy of others.
14:56:30       [exec]     #2) Think before you type.
14:56:30       [exec]     #3) With great power comes great responsibility.
14:56:30       [exec] 
14:56:30       [exec] sudo: no tty present and no askpass program specified
14:56:30  
14:56:30  BUILD FAILED
sxa commented 6 days ago

@smlambert Are these tests currently expected to be in a good state?

smlambert commented 6 days ago

I am not sure they have been triaged since this PR landed https://github.com/adoptium/aqa-tests/pull/5460 (which seems to prefer podman if both are found on machines).

sxa commented 6 days ago

@judovana Could your PR referenced in the previous comment have prevented the external tests from running when only docker is available? It may need some more testing based on what I'm seeing in the runs above.

judovana commented 6 days ago

hi! I was not touching the decision tree, whether the test should run or not. From my PR, the sudo should not have been used at all, unless user literally forced it. So I guess the original call before the PR was "sudo docker..." and was failing anyway (?). I thought I adapted all calls except criu test, but I could have missed some. In meantime, also the https://github.com/adoptium/aqa-tests/pull/5553 landed, and that was even more intrusive change.

From, reading the PRs, I doubt I have "prevented the external tests from running when only docker is available" but I could have break them in unpredicted way.

I have never run whole external tests as one batch, because even before both https://github.com/adoptium/aqa-tests/pull/5553 and https://github.com/adoptium/aqa-tests/pull/5460 they were failing. I found several combinations which worked on most jdks (lucene, jacoco and few other) and tested on those. I'm going to map what external tests runs/pass/fails on various docker/podman/jdk8/11/17/21/ubuntu/fedora/rhels . it will take some time, but as result, should be healthy external subset (https://github.com/adoptium/aqa-tests/issues/5575#issuecomment-2341256974)

sxa commented 6 days ago

Thanks for the update. We can figure a look next week ... Time to relax for the weekend I feel!