adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
86 stars 102 forks source link

missing nonintel docker providing machine #3482

Open judovana opened 8 months ago

judovana commented 8 months ago

There are no nodes with the label 'ci.role.test&&sw.os.linux&&hw.arch.aarch64&&sw.tool.docker' when attemted to run external tests out of intel.

Please put the name of the software product (and affected platforms if relevant) in the title of this issue: docker on non intels, namely on aarch64

When one want to run external tests - or anything else rwquiring docker - they are moreover limited to linux intel. At least aarch64 linuxes, should be also prosion-able

It wouldbe nice, if those new installs would already provide podman out of the box, with just docker wrapper.

sxa commented 6 months ago

For aarch64 we should be able to use the newer OSUOSL ones. For other architectures once we start getting older images replaced with e.g. Ubuntu 24.04 we should be able to enable more of these

Haroon-Khel commented 3 weeks ago

Checking to see if docker is installed

Machine Docker installed sw.tool.docker label?
build-digitalocean-centos69-x64-2
build-marist-rhel79-s390x-1
build-marist-rhel8-s390x-1
build-osuosl-centos74-ppc64le-1 ❌ Offline since we're building in docker containers
build-osuosl-centos74-ppc64le-2 ❌ Offline since we're building in docker containers
test-osuosl-ubuntu2204-aarch64-1 (in inventory file as a build machine, should be updated)
dockerhost-skytap-ubuntu2004-ppc64le-1 (duplicate entry in inventory.yml)
dockerhost-azure-ubuntu2204-x64-1
dockerhost-azure-ubuntu2404-x64-1
dockerhost-equinix-ubuntu2404-armv8-1
dockerhost-equinix-ubuntu2204-armv8-1
dockerhost-osuosl-ubuntu2404-ppc64le-1
dockerhost-osuosl-ubuntu2404-aarch64-1
dockerhost-marist-ubuntu2404-s390x-1 offline
dockerhost-skytap-ubuntu2204-x64-1
test-azure-ubuntu2404-x64-1
test-aws-rhel76-armv8-1
test-aws-rhel8-x64-1
test-osuosl-centos74-ppc64le-1
test-osuosl-centos74-ppc64le-2
test-osuosl-ubuntu1604-ppc64le-1
test-osuos-ubuntu1604-ppc64le-2 deleted as of
test-osuosl-ubuntu1804-ppc64le-1
test-osuosl-ubuntu1804-ppc64le-2
test-osuosl-ubuntu2004-ppc64le-1
test-osuosl-ubuntu2404-aarch64-1
test-marist-rhel7-s390x-2 (offline in jenkins)
test-marist-rhel8-s390x-2 (podman)
test-marist-sles12-s390x-2 ✅ ? config likely copied from an existing machine without removing label
test-marist-sles15-s390x-2
test-marist-ubuntu2404-s390x-1
test-marist-ubuntu2204-s390x-1
test-rise-ubuntu2310-riscv64-1
test-rise-ubuntu2310-riscv64-2
test-rise-ubuntu2404-riscv64-1 (cannot access and offline in jenkins) ?
test-rise-ubuntu2404-riscv64-2
test-rise-ubuntu2404-riscv64-3
test-rise-ubuntu2404-riscv64-4
test-rise-ubuntu2404-riscv64-5
test-rise-ubuntu2404-riscv64-6 (cannot access and is offline in jenkins) ?
test-rise-ubuntu2404-riscv64-7
test-skytap-ubuntu2004-ppc64le-1
test-ibmcloud-rhel6-x64-1
test-ibmcloud-rhel7-x64-1
test-ibmcloud-ubuntu1604-x64-1
Haroon-Khel commented 3 weeks ago

So the question is which machines should have the sw.tool.docker label? Some of the machines which do not have the label have a docker label, so this would need to be edited to sw.tool.docker. In the case of the dockerhost machines, these machines have not been setup with the whole unix playbook, just the dockerhost one. They do have docker installed (obviously). Are these ok to have the sw.tool.docker label?

Haroon-Khel commented 3 weeks ago

Related (and possible duplicate) https://github.com/adoptium/infrastructure/issues/1044

sxa commented 3 weeks ago

dockerhost machines should not be used for test, so they would not typically have sw.tool.docker applied. Anything systems with ci.role.test that has docker available and working should be a valid candidate for sw.tool.docker.

Haroon-Khel commented 2 weeks ago

Running some external tests on the machines which have docker installed, have the ci.role.test label and should (but not cuurently) have the sw.tool.docker label

test-aws-rhel8-x64-1 https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.external_x86-64_linux/647/console

15:40:29       [exec] DOCKERIMAGE_TAG nightly has been recognized.
15:40:29       [exec] /usr/bin/which: no podman in (/home/jenkins/.local/bin:/home/jenkins/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
15:40:29       [exec] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
15:40:29       [exec] Result: 1

Having trouble connecting to the docker daemon

test-osuosl-ubuntu2004-ppc64le-1 https://ci.adoptium.net/job/Test_openjdk11_hs_sanity.external_ppc64le_linux/13/console

15:45:10       [exec] INFO:  docker build  --no-cache -t adoptopenjdk-camel-test:11-jdk-ubuntu-hotspot-full -f /home/jenkins/workspace/Test_openjdk11_hs_sanity.external_ppc64le_linux/jvmtest/external/camel/dockerfile/11/jdk/ubuntu/Dockerfile.hotspot.full /home/jenkins/workspace/Test_openjdk11_hs_sanity.external_ppc64le_linux/jvmtest/external/
15:45:10       [exec] #####################################################
15:45:10       [exec] Sending build context to Docker daemon  87.04kB
15:45:10       [exec] Step 1/26 : ARG IMAGE=docker.io/library/eclipse-temurin:11-jdk
15:45:10       [exec] Step 2/26 : ARG OS=ubuntu
15:45:10       [exec] Step 3/26 : ARG IMAGE_VERSION=nightly
15:45:10       [exec] Step 4/26 : ARG TAG=11-jdk
15:45:10       [exec] Step 5/26 : FROM $IMAGE
15:45:10       [exec]  ---> 013dea8656df
15:45:10       [exec] Step 6/26 : ENV RESULT_COMMENT="IN CONTAINER(not-as-root/docker)"
15:45:10       [exec] 
15:45:10       [exec]             https://docs.docker.com/go/buildx/
15:45:10       [exec] 
15:45:11       [exec]  ---> Running in 36ecfe07aa6d
15:45:11       [exec] Removing intermediate container 36ecfe07aa6d
15:45:11       [exec]  ---> 58b9e1eb7651
15:45:11       [exec] Step 7/26 : ARG CAMEL_TAG=2.7.0
15:45:11       [exec]  ---> Running in bdc6671d1b44
15:45:11       [exec] Removing intermediate container bdc6671d1b44
15:45:11       [exec]  ---> d0159a1123b3
15:45:11       [exec] Step 8/26 : RUN apt-get update    && apt-get install -qq -y --no-install-recommends software-properties-common    && apt-get install -qq -y --no-install-recommends gnupg     && add-apt-repository ppa:ubuntu-toolchain-r/test   && apt-get update   && apt-get install -y --no-install-recommends git   && rm -rf /var/lib/apt/lists/*
15:45:11       [exec]  ---> Running in b6c72d27dfb9
15:45:15       [exec] The command '/bin/sh -c apt-get update    && apt-get install -qq -y --no-install-recommends software-properties-common    && apt-get install -qq -y --no-install-recommends gnupg     && add-apt-repository ppa:ubuntu-toolchain-r/test   && apt-get update   && apt-get install -y --no-install-recommends git   && rm -rf /var/lib/apt/lists/*' returned a non-zero code: 132

Fails during the docker build, at what looks like a package install

test-rise-ubuntu2310-riscv64-1 https://ci.adoptium.net/job/Test_openjdk21_hs_sanity.external_riscv64_linux/23/console

15:48:15  prepare_base_image:
15:48:15       [echo] Executing external.sh --prepare --dir criu-functional --tag nightly --version 21 --impl hotspot --base_docker_registry_url 'docker.io' --base_docker_registry_dir 'default' --docker_args -v /home/jenkins/workspace/Test_openjdk21_hs_sanity.external_riscv64_linux/jdkbinary/j2sdk-image:/opt/java/openjdk 
15:48:15       [exec] The test here is criu-functional
15:48:15       [exec] The directory in the external.sh is criu-functional
15:48:15       [exec] DOCKERIMAGE_TAG nightly has been recognized.
15:48:15       [exec] No credential available for container registry, will proceed without login...
15:48:15       [exec] sudo podman pull docker.io/eclipse-temurin:21-jdk
15:48:15       [exec] sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper
15:48:15       [exec] sudo: a password is required
15:48:15  
15:48:15  BUILD FAILED
15:48:15  /home/jenkins/workspace/Test_openjdk21_hs_sanity.external_riscv64_linux/aqa-tests/TKG/scripts/build_test.xml:95: The following error occurred while executing this line:
15:48:15  /home/jenkins/workspace/Test_openjdk21_hs_sanity.external_riscv64_linux/aqa-tests/external/build.xml:52: The following error occurred while executing this line: 

Needs a sudo password to proceed

Am I running these wrong? ping @smlambert

sxa commented 1 week ago

15:48:15 [exec] sudo podman pull docker.io/eclipse-temurin:21-jdk

Two things:

  1. This is likely the first time we've tried to run these at Adoptium on podman
  2. They should not require sudo access - podman can be run as a normal user if it is configured properly so I'm not sure why this was done like this and we should try and remove that requirement.
judovana commented 1 week ago

The criu tests are not affected by the: https://github.com/adoptium/aqa-tests/pull/5460 ; becasue there is no way how to test them. I would recomend to try different external test. I'm abel to run them all on podman withotu sudo, or on docekr with eg runas