Closed sxa closed 2 years ago
I'm going to use this as a conclusive verification of a number of other infrastructure PRs that we have in flight just now, so I won't run the playbooks until after they are merged:
For future reference before syncing inventories in awx you have to update the project source first in order for awx to have the latest inventory file. I assumed the syncing inventory process automatically pulled the latest inventory file.
Running https://awx2.adoptopenjdk.net/#/jobs/playbook/137?job_search=page_size:20;order_by:-finished;not__launch_type:sync on test-marist-rhel8-s390x-2 as a prelim playbook run
Failed at the installation of systemtap-sdt-devel
I've created a new job in awx which I can use for debugging/testing. It deploys my own branch, https://github.com/Haroon-Khel/openjdk-infrastructure/tree/awx.debug, which so far the only change is systemtap-sdt-devel
commented out
test-marist-rhel8-s390x-2 is actually a SLES15 machine
test-marist-rhel8-s390x-2:~ # cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"
And test-marist-sles15-s390x-2 is Rhel 8
[root@testrhel8 ~]# cat /etc/os-release
NAME="Red Hat Enterprise Linux"
VERSION="8.6 (Ootpa)"
ID="rhel"
ID_LIKE="fedora"
VERSION_ID="8.6"
PLATFORM_ID="platform:el8"
PRETTY_NAME="Red Hat Enterprise Linux 8.6 (Ootpa)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:redhat:enterprise_linux:8::baseos"
HOME_URL="https://www.redhat.com/"
DOCUMENTATION_URL="https://access.redhat.com/documentation/red_hat_enterprise_linux/8/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
REDHAT_BUGZILLA_PRODUCT_VERSION=8.6
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
REDHAT_SUPPORT_PRODUCT_VERSION="8.6"
Failed at downloading Ant
TASK [ant : Download Apache Ant binaries] **************************************
fatal: [test-marist-rhel8-s390x-2]: FAILED! => {"changed": false, "dest": "/tmp/", "elapsed": 0, "gid": 0, "group": "root", "mode": "01777", "msg": "Request failed: <urlopen error unknown url type: https>", "owner": "root", "size": 255, "state": "directory", "uid": 0, "url": "https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.5-bin.zip"}
Failed at the installation of
systemtap-sdt-devel
Presumably that's only on a subset of the OSs?
Tried deploying to just the RHEL79 build machines - hit https://github.com/adoptium/infrastructure/issues/2700
Tried deploying to test-marist-ubuntu2204 system - failed because gcc7 PR has not yet been merged
Tried deploying to the RHEL79 build machines skipping the docker
tag
Redeploy to RHEL79 after removing /etc/yum.repos.d/docker.repo as that was already in place and preventing yum update
PASSED
Deploying to all test-marist systems (With docker
bypassed to be safe for now)
PLAY RECAP *********************************************************************
test-marist-rhel7-s390x-1 : ok=221 changed=100 unreachable=0 failed=0 skipped=307 rescued=0 ignored=1
test-marist-rhel7-s390x-2 : ok=218 changed=99 unreachable=0 failed=0 skipped=303 rescued=0 ignored=1
test-marist-rhel8-s390x-1 : ok=18 changed=5 unreachable=0 failed=1 skipped=34 rescued=0 ignored=0
test-marist-rhel8-s390x-2 : ok=19 changed=1 unreachable=0 failed=1 skipped=27 rescued=0 ignored=0
test-marist-sles12-s390x-1 : ok=12 changed=2 unreachable=0 failed=1 skipped=25 rescued=0 ignored=0
test-marist-sles12-s390x-2 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
test-marist-sles15-s390x-1 : ok=135 changed=18 unreachable=0 failed=0 skipped=377 rescued=0 ignored=0
test-marist-sles15-s390x-2 : ok=18 changed=7 unreachable=0 failed=1 skipped=34 rescued=0 ignored=0
test-marist-ubuntu1604-s390x-1 : ok=162 changed=28 unreachable=0 failed=0 skipped=349 rescued=0 ignored=0
test-marist-ubuntu1804-s390x-1 : ok=12 changed=1 unreachable=0 failed=1 skipped=24 rescued=0 ignored=0
test-marist-ubuntu1804-s390x-2 : ok=12 changed=1 unreachable=0 failed=1 skipped=24 rescued=0 ignored=0
test-marist-ubuntu1804-s390x-3 : ok=111 changed=18 unreachable=0 failed=1 skipped=268 rescued=0 ignored=0
test-marist-ubuntu1804-s390x-4 : ok=194 changed=85 unreachable=0 failed=0 skipped=317 rescued=0 ignored=0
test-marist-ubuntu2004-s390x-1 : ok=186 changed=74 unreachable=0 failed=0 skipped=325 rescued=0 ignored=0
test-marist-ubuntu2204-s390x-1 : ok=22 changed=1 unreachable=0 failed=1 skipped=32 rescued=0 ignored=0
Failures in Ubuntu 22.04 (Will be gcc-7 - PR ready), Ubuntu 18, the new SLES15 and the old SLES12, and RHEL8. Those will need further investigation. I'm pausing for now so someone else can take over, as it's the build machines I really needed :-) But we havn't hit any problems due to the intrusion prevention on those systems, which is promising.
Failed at the installation of
systemtap-sdt-devel
This is specific to SLES15. It is installed on the -1
sles15 machine so it's not entirely clear why this message is appearing on the other machines, unless it was bypassed . libc.so.6
is on the machine:
test-marist-sles15-s390x-2:~ # ls -l /lib64/libc.so.6
lrwxrwxrwx 1 root root 12 Nov 5 2021 /lib64/libc.so.6 -> libc-2.26.so
test-marist-sles15-s390x-2:~ # zypper install systemtap-sdt-devel
Refreshing service 'SMT-http_lxslsmt'.
Loading repository data...
Reading installed packages...
Resolving package dependencies...
Problem: nothing provides 'libc.so.6(GLIBC_2.27)(64bit)' needed by the to be installed systemtap-4.6-151.d_t.3.s390x
Solution 1: do not install systemtap-sdt-devel-4.6-151.d_t.3.s390x
Solution 2: break systemtap-4.6-151.d_t.3.s390x by ignoring some of its dependencies
Choose from above solutions by number or cancel [1/2/c/d/?] (c): c
test-marist-sles15-s390x-2:~ #
test-marist-ubuntu-1804-s390x-
systems 1
and 2
had these entries in /etc/hosts
:
91.189.95.85 ppa.launchpad.net
91.189.88.142 ports.ubuntu.com
This was preventing them from updating themselves - presumably implemented to bypass a temporary problem at some point - the date stamp on the file was:
-rw-r--r-- 1 root root 487 Apr 22 2021 /etc/hosts
I've commented those lines out of both machines now which should avoid this problem:
root@test-marist-ubuntu1804-s390x-2:~# apt-get update
Err:1 http://ports.ubuntu.com/ubuntu-ports bionic InRelease
Could not connect to ports.ubuntu.com:80 (91.189.88.142), connection timed out
Err:2 http://ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
Unable to connect to ports.ubuntu.com:http:
Err:3 http://ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
Unable to connect to ports.ubuntu.com:http:
Err:4 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
Unable to connect to ports.ubuntu.com:http:
Reading package lists... Done
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/bionic/InRelease Could not connect to ports.ubuntu.com:80 (91.189.88.142), connection timed out
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/bionic-updates/InRelease Unable to connect to ports.ubuntu.com:http:
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/bionic-backports/InRelease Unable to connect to ports.ubuntu.com:http:
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/bionic-security/InRelease Unable to connect to ports.ubuntu.com:http:
W: Some index files failed to download. They have been ignored, or old ones used instead.
sles12-2
was missing the AWX ssh key - now fixed so that should work now.
RHEL8 looks to be trying to install some of the 31-bit (s390
) packages which we probably don't need.
@sxa want me to pick up the systemtap-sdt-devel on test-marist-sles15-s390x-2 ?
Sure - please co-ordinate with Haroon in slack.
That would be helpful @steelhead31 Thanks
Ubuntu 22.04 looking happier now that https://github.com/adoptium/infrastructure/pull/2691 is merged.
The sles15 playbooks run better using python 3 as the ansible_python_interpreter ( which can be specified in the inventory ), and also an issue with the ipv6 configuration on test-marist-sles15-s390x-2 has been resolved by disabling ipv6 as shown below.
1. Edit the file sysctl.conf by executing the command sudo vi /etc/sysctl.conf
2. Add the below 2 lines to the file
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
3. Save and execute the command "sudo sysctl -p" . This would re-load the settings and disables ipv6 address.
4. Execute the command ip a | grep inet - this should only show ipv4 addresses
From Marist: "Let me know when fully migrated and I can remove the old servers as we are targeting end of September to power off the old storage servers."
@Haroon-Khel Looks like there may be some problems that need addressing: https://ci.adoptopenjdk.net/view/Test_openjdk/job/Test_openjdk11_hs_sanity.openjdk_s390x_linux/651
Certainly a subset of them are in the compression code (we've seen issues there elsewhere - at least on Ubuntu 20.04 - that run was on 22.04) and if all the failures are related to that it will be good to confirm which distributions and versions it happens on, as there will be implications elsewhere.
Nagios should be working on all of the new marist machines expect for test-marist-rhel8-s390x-2 due to
No package nagios-plugins-all available
. Should have a quick solution. @steelhead31 Can you check if the marist machines appear in that view you showed earlier?
Added docker
tag onto test-marist-ubuntu2204-s390x-1 as openjdk_build_docker_multiarch builds were getting stuck due to lack of suitable labels. The dockerhost-marist machine is currently unsuitable as despite being in jenkins it appears that it cannot run docker as the jenkins user (See this log from when I tried to add the tag to that machine)
Request for Eclipse to set up two machines for Temurin Compliance:
https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/1917
NOTE: I've brought docker-marist-ubuntu1604-s390x-1 back online in jenkins for now since that one (why not others?) was causing 'temporarily offline in jenkins' messages to appear in the bot channel, but I've switched the docker
label to dockerX
We'll need to understand as part of #1716 why the other marist machines which we have disabled (marked offline in jenkins) are not giving the same notifications e.g. https://ci.adoptopenjdk.net/computer/build%2Dmarist%2Drhel77%2Ds390x%2D1/ and https://ci.adoptopenjdk.net/computer/test%2Dmarist%2Dubuntu1804%2Ds390x%2D1/ (and all the other "old" ones)
Temurin Compliance systems still awaiting setup, but otherwise this is complete. Old machines will need to be deprovisioned, but that is due to be done later.
@Haroon-Khel @steelhead31 Can we remove the old machines from Nagios, Jenkins and the inventory files please as they have now been deprovisioned. Full list as follows (Some of these were temporary systems so if you can't find them, that's not a problem):
Will do, has the ansible inventory been updated with the new ip's / hostnames ?, Im starting work on fixing the discrepancies between nagios and ansible today.
Will do, has the ansible inventory been updated with the new ip's / hostnames ?
Yep the new ones have been live for a few weeks: https://github.com/adoptium/infrastructure/pull/2690/files
In theory removing the ones listed above should only leave the s390x ones added in that PR.
All have now been removed from nagios.
That'll clear up the slack channel a bit then ;-)
The old machines have all been relieved or their duties and returned to Marist.
There is still some more work required to fix some issues that have shown up during this release cycle under #2807 but those can be covered under that issue. The old TCK machines will be decomissioned this week too.
Removing the following machines from inventory.yml and jenkins as they've been decommissioned
* https://ci.adoptopenjdk.net/computer/test-marist-sles15-s390x-1/
* https://ci.adoptopenjdk.net/computer/build-marist-rhel77-s390x-1/
* https://ci.adoptopenjdk.net/computer/build-marist-rhel77-s390x-2/
* https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1604-s390x-1/
* https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1804-s390x-1/
* https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1804-s390x-2/
* https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1804-s390x-3/
* https://ci.adoptopenjdk.net/computer/test-marist-ubuntu1804-s390x-4/
* https://ci.adoptopenjdk.net/computer/docker-marist-ubuntu1604-s390x-1/
To avoid having to go through support for any requests on our Marist systems, they have been trialling a self-service interface for their machines and it is ready to be used as the primary method for provisioning our machines. #2267 has machines which have been provisioned through the new interface and we should start migrating our existing systems across to this too.
The first step will be to ensure we have capacity in the system (At the moment the account I'm using only has 4 machine slots available) and then start duplicating the existing machines in it, followed by decomissioning the existing ones. We will likely look at having at least one
dockerhost
system in order to have a wider range of distributions tested for Linux/s390x (Subject to availability...)Systems ready for installation: