adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

QPC: All platforms are unstable #2121

Open Willsparker opened 3 years ago

Willsparker commented 3 years ago

Ref: https://ci.adoptopenjdk.net/job/QEMUPlaybookCheck/229/

With the latest QPC run, all platforms have failed to some degree. 4 have failed due calling buildJDK.sh in an incorrect way (incorrect as of #1962 ).

The risc-v platform is still blocked by #1483

And the arm32 platform seems to be running out of space during the playbook execution.

Willsparker commented 3 years ago

Changes made to qemuPlaybookCheck.sh have been made to use the correct arguments (willsparker/2123_1). Testing on QPC at: https://ci.adoptopenjdk.net/job/QEMUPlaybookCheck/230/

Willsparker commented 3 years ago

The above PR should fix the buildJDK.sh issues. Once this has been merged, I'll look at running the arm32 box, to determine if there's still issues with the space.

Willsparker commented 3 years ago

The PR to fix the arguments has been merged, and it appears to be working on all platforms. For the S390x and ppc64le architectures, the linux.sh script fails for JDK8, as it is unable to find JDK7 on the machines (as Zulu-7 is not installed). I thought that they were meant to default then to using JDK8, but apparently not. This is not the case for JDK11, where I have had the build start on both machines (though currently, I've only managed a QPC run that had ppc64le fail (maybe due to the wrong version of gcc..?), and s390x that core dumped (maybe due to a bad jdk-10 install..?))

I'm going to look into that platform-specific-configuration script, to see if it's meant to set the boot jdk to the build jdk in cases wehre the boot jdk can't be found. If it isn't, I'll look into install zulu-7 on those platforms.

sxa commented 3 years ago

It can always be overriden with JDK7_BOOT_DIR if needed, but I know we've tried to let the autodetection work. My preference is for it not to fall back if possible, since you then run the risk of being unsure whether from one build to the next it's used a JDK7 or JDK8 as the boot JDK. Our "real" CentOS/RHEL7 ppc64le and s390x build machines all use a true JDK7.

Willsparker commented 3 years ago

Okay! Thanks for the guidance. Looks like I'll have to adapt the Zulu-7 ansible task to include those platforms :-)

EDIT: Turns out no, this can't be done (see: https://adoptium.slack.com/archives/C53GHCXL4/p1619777906294700). Best case is having those platforms use JDK-8 to build

Willsparker commented 3 years ago

With https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/2176#issuecomment-836369749 , I think the last part of fixing the non RISC-V platforms is to figure out how to extend the build images, as they run out of space halfway through a build.

Willsparker commented 3 years ago

QEMU qcow2 images can be resized easily enough with qemu-img resize $IMG +10G (if you wanted to add 10GB to it). However, the partitions still need to be extended, which will be specific to the platform, and is likely to be a massive pain. Helpfully, I think all of the platforms that fail due to disk issues will fail in the build. Therefore, we could mount it as a new partition and just build on that one if it's that difficult.

EDIT: AH hang on, I've already documented all of this. Okay, just going to extend them by 10GB, and create a new partition on /home/linux that will have that extra 10GB. This method should be fine for all images that don't have a swap partitions. I'll create backups of the current images, just in case :-)

Willsparker commented 3 years ago

List of what I did to extend each of the images. After extending partitions, I'll re-compress the image with xz -z, and move the images into /home/jenkins/qemu_base_images/resized_images/, that way we still have backups of the old images

When logging into the machine, it automatically extended the root partition, which was lovely to see. So that has 25GB on it now.

root@debian:~# lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    254:0    0  25G  0 disk 
|-vda1 254:1    0  10G  0 part /
`-vda2 254:2    0   5G  0 part /home/linux
root@debian:~# fdisk /dev/vda
Command (m for help): d
Partition number (1,2, default 2): 2

Partition 2 has been deleted.

Command (m for help): n
Partition number (2-128, default 2): 2
First sector (20971393-52428766, default 20971520): 
Last sector, +/-sectors or +/-size{K,M,G,T,P} (20971520-52428766, default 52428766): 

Created a new partition 2 of type 'Linux filesystem' and of size 15 GiB.
Partition #2 contains a ext4 signature.

Do you want to remove the signature? [Y]es/[N]o: Y

The signature will be removed by a write command.
Command (m for help): w
The partition table has been altered.
Syncing disks.

root@debian:~# lsblk
NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda    254:0    0  25G  0 disk 
|-vda1 254:1    0  10G  0 part /
`-vda2 254:2    0  15G  0 part /home/linux

This ones a bit different, there's a lot more partitions:

$ lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda    254:0    0   30G  0 disk 
├─vda1 254:1    0  243M  0 part /boot
├─vda2 254:2    0  5.3G  0 part /
├─vda3 254:3    0    1K  0 part 
├─vda5 254:5    0  502M  0 part [SWAP]
└─vda6 254:6    0   14G  0 part /home

Unfortunately, I can't seem to find a way of extending the root partition, without the following error / issue on startup

[  155.371104] FS-Cache: Netfs 'nfs' registered for caching
Welcome to emergency mode! After logging in, type "journalctl -xb" to view
system logs, "systemctl reboot" to reboot, "systemctl default" to try again
to boot into default mode.
[  155.698568] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
Give root password for maintenance
(or type Control-D to continue): 

I tried removing partitions 2-6, and recreating the / and /home partition, but the same issue occurs. I tried just removing the second and third partition, and it caused the rest of them to disappear as well. I'm going to skip this one for now, in the hopes that somebody else knows how to do this ( @sxa ...? :eyes: )

Willsparker commented 3 years ago

The disk format is raw. I was able to extend it with qemu-img resize <> +10G though, and it is showing up in lsblk:

NAME                             MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                              252:0    0   30G  0 disk 
├─vda1                           252:1    0  512M  0 part /boot/efi
└─vda2                           252:2    0 19.5G  0 part 
  ├─ubuntu--18--arm64--vg-root   253:0    0 18.6G  0 lvm  /
  └─ubuntu--18--arm64--vg-swap_1 253:1    0  976M  0 lvm  [SWAP]

However, whenever I run any fdisk commands, I get GPT PMBR size mismatch (41943039 != 62914559) will be corrected by w(rite)., which fails when I exit, with fdisk: failed to write disklabel: Invalid argument. For some reason, fdisk can't fix this, but parted can. Running parted -l will prompt a fix/ignore check. After this, I followed the instructions from this link - created a new partition with fdisk (called /dev/vda3) with the remaining space, created the physical volume pvcreate /dev/vda3, extended the volume group with vgextend /dev/ubuntu-18-arm64-vg /dev/vda3. From there, I extended the LV that was needed lvextend +2568 /dev/ubuntu-18-arm64-vg/root and resized it to the file system : resize2fs /dev/ubuntu-18-arm64-vg/root. The result is:

root@ubuntu-18-arm64:~# lsblk
NAME                             MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                              252:0    0   30G  0 disk 
├─vda1                           252:1    0  512M  0 part /boot/efi
├─vda2                           252:2    0 19.5G  0 part 
│ ├─ubuntu--18--arm64--vg-root   253:0    0 28.6G  0 lvm  /
│ └─ubuntu--18--arm64--vg-swap_1 253:1    0  976M  0 lvm  [SWAP]
└─vda3                           252:3    0   10G  0 part 
  └─ubuntu--18--arm64--vg-root   253:0    0 28.6G  0 lvm  /

Which looks alright to me :+1:

Also uses LVM, and a raw disk image, so I'll do the same process as before, hopefully. Initial lsblk shows (after extending the disk):

NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                    8:0    0   25G  0 disk 
├─sda1                 8:1    0    7M  0 part 
└─sda2                 8:2    0   15G  0 part 
  ├─linux--vg-root   253:0    0 14.3G  0 lvm  /
  └─linux--vg-swap_1 253:1    0  676M  0 lvm  [SWAP]
sr0                   11:0    1 1024M  0 rom 

I wonder if sr0 there is going to cause issues .. Nope, didn't seem to. Having followed the same instructions as before (and instructions on how to extend the LVM here):

root@linux:~# lsblk
NAME                 MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda                    8:0    0   25G  0 disk 
├─sda1                 8:1    0    7M  0 part 
├─sda2                 8:2    0   15G  0 part 
│ ├─linux--vg-root   253:0    0 24.3G  0 lvm  /
│ └─linux--vg-swap_1 253:1    0  676M  0 lvm  [SWAP]
└─sda3                 8:3    0   10G  0 part 
  └─linux--vg-root   253:0    0 24.3G  0 lvm  /
sr0                   11:0    1 1024M  0 rom 

This one is a QCOW2 image, so same command as all of them. This one is also LVM, but for some reason, didn't require parted -l.

root@ubuntu:~# lsblk
NAME                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                   252:0    0   25G  0 disk 
└─vda1                252:1    0   15G  0 part 
  ├─ubuntu--vg-root   253:0    0   14G  0 lvm  /
  └─ubuntu--vg-swap_1 253:1    0  964M  0 lvm  [SWAP]

After the same above process:

NAME                  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
vda                   252:0    0   25G  0 disk 
├─vda1                252:1    0   15G  0 part 
│ ├─ubuntu--vg-root   253:0    0 24.1G  0 lvm  /
│ └─ubuntu--vg-swap_1 253:1    0  964M  0 lvm  [SWAP]
└─vda2                252:2    0   10G  0 part 
  └─ubuntu--vg-root   253:0    0 24.1G  0 lvm  /

I'm glad I/whoever setup the Ubuntu Images, had the foresight to make the partitions LVM :-)

Willsparker commented 3 years ago

With all but the Debian.ARM32 image extended, I'll swap the original images with the resized images to test on the vagrant server. I'm aware that now QPC can run on build-equinix-ubuntu2004-armv8-1, so I'll quickly reconfigure the job to just use the infra-ibmcloud-vagrant* machines for the time being. If all goes well, I'll move the images over :-)

See: QPC#264

EDIT: Looks like the Debian ones failed - it looks like debian10/aarch64 didn't boot in time, and something else wrong with debian11/riscv64. Not too worried about RISCV, as it is currently running fine in QPC#263

Willsparker commented 3 years ago

Nope, I messed up the RISCV one - this happened 3 times now:

08:46:58 /usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

08:46:59 /usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys

08:47:00 mkdir: cannot create directory '.ssh': Permission denied

I would assume in the process of extending the partition, the Linux user no longer owns their own home directory. Redid that, and tested here: QPC#267

It still looks like the Debian10/aarch64 one isn't booting in time. I'll try a local test where I extend the boot time to 180s, to see if this fixes it.

Bright side, both Ubuntu18/ppc64le and Ubuntu18/aarch64 seem to be fine after the image resize, in QPC#264- ppc64le actually passed, and aarch64 failed the test, though it was able to fully complete a build, which is nice.

Ubuntu18/s390x appears to be failing on apt-get upgrade :

Processing triggers for initramfs-tools (0.130ubuntu3.12) ...
update-initramfs: Generating /boot/initrd.img-4.15.0-74-generic
Using config file '/etc/zipl.conf'
Run /lib/s390-tools/zipl_helper.device-mapper /boot
Error: Unsupported setup: Directory '/boot' is located on a multi-target device-mapper device
Error: Script could not determine target parameters
run-parts: /etc/initramfs/post-update.d//zz-zipl exited with return code 1
dpkg: error processing package initramfs-tools (--configure):
 installed initramfs-tools package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 linux-firmware
 initramfs-tools
E: Sub-process /usr/bin/dpkg returned an error code (1)

Presumably this is an issue with the root file system now being on a several partitions. Interesting that this wasn't an issue with the other Ubuntu machines. I could apt hold the linux-firmware and initramfs-tools packages to 'fix' the issue, but that feels wrong ...

Haroon-Khel commented 3 years ago

Latest failures/instabilities

ppc64le ubuntu18 - fails in the test stage. Looks like the test script needs to be updated to support the new name of the tests repo

00:17:27 TESTDIR: /home/linux/testLocation/openjdk-tests is invalid. Please use --testdir|-t to set valid TESTDIR under aqa-tests. Default value current dir (pwd) is used if not provided.
00:17:32 /home/linux/openjdk-infrastructure/ansible/pbTestScripts/testJDK.sh: line 15: cd: /home/linux/testLocation/openjdk-tests/TKG: No such file or directory
00:17:32 + grep -q 'FAILED: 0' /home/vagrant1/workspace/QEMUPlaybookCheck/ARCHITECTURE/ppc64le/OS/ubuntu18/label/vagrant/ansible/pbTestScripts/qemu_pbCheck/logFiles/UBUNTU18.PPC64LE.test_log
00:17:32 + echo TEST FAILED

s390x ubuntu18 - fails when ansible tries running apt-get upgrade, https://ci.adoptopenjdk.net/job/QEMUPlaybookCheck/272/ARCHITECTURE=s390x,OS=ubuntu18,label=vagrant/consoleFull

12:07:41 TASK [Common : Run apt-get upgrade] ********************************************
12:43:05 fatal: [localhost]: FAILED! => {"changed": false, "msg": "'/usr/bin/apt-get upgrade --with-new-pkgs ' failed: E: Sub-process /usr/bin/dpkg returned an error code (1)\n", "rc": 100, "stdout": "Reading package lists...\nBuilding dependency tree...\nReading state information...\nCalculating upgrade...\nThe following NEW packages will be installed:\n  distro-info gcc-11-base libgcc-s1 libnetplan0 linux-headers-4.15.0-144\n  linux-headers-4.15.0-144-generic linux-image-4.15.0-144-generic\n  linux-modules-4.15.0-144-generic linux-modules-extra-4.15.0-144-generic\nThe following packages will be upgraded:\n  accountsservice apt apt-utils base-files bind9 bind9-doc bind9-host\n  bind9utils bsdutils busybox-initramfs busybox-static ca-certificates dbus\n  distro-info-data dmeventd dmsetup dnsutils e2fsprogs fdisk file\n  friendly-recovery gcc-8-base initramfs-tools initramfs-tools-...

aarch64 ubuntu18 - disconnected during build stage, playbook passes without error

01:01:35 Connection to localhost closed by remote host.

aarch64 debian10 - Connection reset by peer

12:01:00 TASK [Gathering Facts] *********************************************************
12:01:00 fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: kex_exchange_identification: read: Connection reset by peer\r\nConnection reset by 127.0.0.1 port 10020", "unreachable": true}

arm32 debian8 - Fails to download GCC 7.5 binary. URL is invalid

14:15:17 TASK [gcc_7 : Download AdoptOpenJDK gcc-7.5.0 binary] **************************
14:15:35 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/tmp/ansible-adoptopenjdk-gcc-7.tar.xz", "elapsed": 6, "msg": "Request failed", "response": "HTTP Error 404: Not Found", "status_code": 404, "url": "https://ci.adoptopenjdk.net/userContent/gcc/gcc750+ccache.armv7l.tar.xz"}
14:15:35 

riscv seems to be successful in its playbook run

sxa commented 3 years ago

Looks like the test script needs to be updated to support the new name of the tests repo

Hmmm all requests to the old repo should redirect so there may be another underlying issue there ...

Haroon-Khel commented 3 years ago

Hmmm all requests to the old repo should redirect so there may be another underlying issue there ...

I received this error in my own time when I was running some tests using the tests repo. This error would pop up when using the ./get.sh script. I noticed that I had only began to hit this error when the repo name changed from openjdk-tests to aqa-tests. I was using a local copy of the tests repo that I had cloned before the name change. I found that changing the folder name to aqa-tests, from openjdk-tests solved it. Ive put in the pr https://github.com/adoptium/infrastructure/pull/2230, ill be testing it to see if it solves it

sxa commented 3 years ago

arm32 debian8 - Fails to download GCC 7.5 binary. URL is invalid

That looks from the log as though the job was tested using a fork of the repository that doesn't have this change in it: https://github.com/adoptium/infrastructure/pull/2201/files

Haroon-Khel commented 3 years ago

That looks from the log as though the job was tested using a fork of the repository that doesn't have this change in it: https://github.com/adoptium/infrastructure/pull/2201/files

Possibly. I was testing this pr, https://github.com/adoptium/infrastructure/pull/2203 (I shouldve just tested the playbook run in hindsight since the changes only affect mac), at the time. Ill run a new job on master

sxa commented 3 years ago

I received this error in my own time when I was running some tests using the tests repo. This error would pop up when using the ./get.sh script.

Ah right ... So get.sh has been modified to explicitly check that it has been extracted to a directory with the new name, which isn't the case when going via the redirect so https://github.com/adoptium/aqa-tests/pull/2612/files is what broke it. In that case it should be an easy fix :-)

Haroon-Khel commented 3 years ago

Possibly. I was testing this pr, #2203 (I shouldve just tested the playbook run in hindsight since the changes only affect mac), at the time. Ill run a new job on master

Looks like its able to download it fine, but seems to have run out of space on the disk. Odd

14:27:07 TASK [gcc_7 : Extract AdoptOpenJDK gcc-7 binary to /usr/local/gcc] *************
14:31:01 fatal: [localhost]: FAILED! => {"changed": false, "dest": "/usr/local/", "extract_results": {"cmd": ["/bin/tar", "--extract", "-C", "/usr/local/", "-z", "-f", "/tmp/ansible-adoptopenjdk-gcc-7.tar.gz"], "err": "/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/cc1obj: Wrote only 9728 of 10240 bytes\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-
gnueabihf/7.5.0/lto-wrapper: Cannot write: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/lto1: Cannot write: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/f951: Cannot write: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-
gnueabihf/7.5.0/install-tools: Cannot mkdir: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/install-tools: Cannot mkdir: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-
gnueabihf/7.5.0/install-tools/fixincl: Cannot open: No such file or directory\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-
gnueabihf/7.5.0/install-tools: Cannot mkdir: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-
gnueabihf/7.5.0/install-tools/fixinc.sh: Cannot open: No such file or directory\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-
linux-gnueabihf/7.5.0/install-tools: Cannot mkdir: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/install-tools/mkheaders: Cannot open: No such file or directory\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/install-tools: Cannot mkdir: No space left on device\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-
gnueabihf/7.5.0/install-tools/mkinstalldirs: Cannot open: No such file or directory\n/bin/tar: gcc/libexec/gcc/armv7l-unknown-linux-gnueabihf/7.5.0/liblto_plugin.la: Cannot write: No space left on device\n/bin/tar: 
Exiting with failure status due to previous errors\n", "out": "", "rc": 2}, "gid": 50, "group": "staff", "handler": "TgzArchive", "mode": "02775", "msg": "failed to unpack /tmp/ansible-adoptopenjdk-gcc-7.tar.gz to /usr/local/", "owner": "root", "size": 4096, "src": "/tmp/ansible-
adoptopenjdk-gcc-7.tar.gz", "state": "directory", "uid": 0}
sxa commented 2 years ago

Running new job at https://ci.adoptopenjdk.net/view/Tooling/job/QEMUPlaybookCheck/294 to evaluate current status.

sxa commented 2 years ago

This might be tricky to fully work through in a short period, but I'm adding good first issue since it should be possible for someone with good interest in cross-platform virtualisation to try and work on using their own machines.

Haroon-Khel commented 1 year ago

As of 19/12/22 the failing builds are

aarch64 deb10

23:03:53 TASK [Common : Allow https apt sources] ****************************************
23:04:36 [WARNING]: Updating cache and auto-installing missing dependency: python-apt
23:04:36 fatal: [localhost]: FAILED! => {"changed": false, "cmd": "apt-get update", "msg": "E: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Suite' value from 'stable' to 'oldstable'\nE: Repository 'http://deb.debian.org/debian buster-updates InRelease' changed its 'Suite' value from 'stable-updates' to 'oldstable-updates'", "rc": 100, "stderr": "E: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Suite' value from 'stable' to 'oldstable'\nE: Repository 'http://deb.debian.org/debian buster-updates InRelease' changed its 'Suite' value from 'stable-updates' to 'oldstable-updates'\n", "stderr_lines": ["E: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Suite' value from 'stable' to 'oldstable'", "E: Repository 'http://deb.debian.org/debian buster-updates InRelease' changed its 'Suite' value from 'stable-updates' to 'oldstable-updates'"], "stdout": "Get:1 http://deb.debian.org/debian buster InRelease [122 kB]\nGet:2 http://deb.debian.org/debian buster-updates InRelease [56.6 kB]\nGet:3 http://deb.debian.org/debian buster-backports InRelease [51.4 kB]\nGet:4 http://security.debian.org/ buster/updates InRelease [34.8 kB]\nGet:5 http://deb.debian.org/debian buster-backports/main Sources [456 kB]\nGet:6 http://deb.debian.org/debian buster-backports/main arm64 Packages [482 kB]\nGet:7 http://deb.debian.org/debian buster-backports/main Translation-en [411 kB]\nGet:8 http://security.debian.org/ buster/updates/main Sources [285 kB]\nGet:9 http://security.debian.org/ buster/updates/main arm64 Packages [403 kB]\nGet:10 http://security.debian.org/ buster/updates/main Translation-en [223 kB]\nReading package lists...\n", "stdout_lines": ["Get:1 http://deb.debian.org/debian buster InRelease [122 kB]", "Get:2 http://deb.debian.org/debian buster-updates InRelease [56.6 kB]", "Get:3 http://deb.debian.org/debian buster-backports InRelease [51.4 kB]", "Get:4 http://security.debian.org/ buster/updates InRelease [34.8 kB]", "Get:5 http://deb.debian.org/debian buster-backports/main Sources [456 kB]", "Get:6 http://deb.debian.org/debian buster-backports/main arm64 Packages [482 kB]", "Get:7 http://deb.debian.org/debian buster-backports/main Translation-en [411 kB]", "Get:8 http://security.debian.org/ buster/updates/main Sources [285 kB]", "Get:9 http://security.debian.org/ buster/updates/main arm64 Packages [403 kB]", "Get:10 http://security.debian.org/ buster/updates/main Translation-en [223 kB]", "Reading package lists..."]}

s390x ubuntu18

23:06:25 TASK [Common : Run apt-get upgrade] ********************************************
...
 "Error: Unsupported setup: Directory '/boot' is located on a multi-target device-mapper device",

riscv deb11

23:08:13 TASK [Common : Allow https apt sources] ****************************************
23:08:34 fatal: [localhost]: FAILED! => {"cache_update_time": 1598344993, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'apt-transport-https'' failed: E: Failed to fetch http://deb.debian.org/debian-ports/pool/main/a/apt/apt-transport-https_2.1.10_all.deb  404  Not Found [IP: 199.232.10.132 80]\nE: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?\n

arm32 deb8 Failed to install the following package for what looked to be a dependency reason:

failed: [localhost] (item=flex)
failed: [localhost] (item=g++)
failed: [localhost] (item=gcc)
failed: [localhost] (item=gettext)
failed: [localhost] (item=libexpat1-dev)
failed: [localhost] (item=libcups2-dev)
failed: [localhost] (item=libfreetype6-dev)
failed: [localhost] (item=libfontconfig1-dev)
failed: [localhost] (item=libgmp3-dev)
failed: [localhost] (item=libmpfr-dev)
failed: [localhost] (item=systemtap-sdt-dev)
Haroon-Khel commented 1 year ago

Using QEMU playbook check in github workflows. It can do all but riscv https://github.com/adoptium/infrastructure/pull/2861

sxa commented 1 year ago

As of 19/12/22 the failing builds are

aarch64 deb10

23:03:53 TASK [Common : Allow https apt sources] ****************************************
23:04:36 [WARNING]: Updating cache and auto-installing missing dependency: python-apt
23:04:36 fatal: [localhost]: FAILED! => {"changed": false, "cmd": "apt-get update", "msg": "E: Repository http://deb.debian.org/debian buster InRelease' changed its 'Suite' value from 'stable' to 'oldstable'

That one will be from Debian 10 going out of support, so we should either have something in the playbooks to update the apt repo reference. See https://wiki.debian.org/DebianOldStable for background.

s390x ubuntu18

23:06:25 TASK [Common : Run apt-get upgrade] ********************************************
...
"Error: Unsupported setup: Directory '/boot' is located on a multi-target device-mapper device",

@Willsparker Have you hit this one before?

riscv deb11

23:08:13 TASK [Common : Allow https apt sources] ****************************************
23:08:34 fatal: [localhost]: FAILED! => {"cache_update_time": 1598344993, "cache_updated": false, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"      install 'apt-transport-https'' failed: E: Failed to fetch http://deb.debian.org/debian-ports/pool/main/a/apt/apt-transport-https_2.1.10_all.deb  404  Not Found [IP: 199.232.10.132 80]

That's slightly odd and suggests that apt-get update has not been able to complete successfully since the directory mentioned there does have version 2.5.5 of the package in place.

arm32 deb8 Failed to install the following package for what looked to be a dependency reason:

failed: [localhost] (item=flex)
failed: [localhost] (item=g++)

May just be because Debian8 is ancient (although it might be interesting to see if we can point directly at the correct repo from archive.debian.org). However we should probably check what the latest Raspbian image is and go with that (and possibly add a recent Ubuntu since that is become more common on the platform) EDIT: Current is based on Debian 11 (Bullseye) with kernel 5.15, the "legacy" image is Debian 10 (Buster) with kernel 5.10) Reference