Open babsingh opened 2 years ago
@AdamBrousseau @jdekonin Can you please provide feedback on this issue? Is it feasible? Any concerns about the approach?
fyi @0xdaryl @EricYangIBM
We currently have 12 plinux (linux_ppc-64_le_gcc) ubuntu 16 (10 UNB & 2 OSU) 13 xlinux (linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs) ubuntu 16 (UNB) 1 xlinux (linux_riscv64_cross) debian 10 (UNB) 2 zlinux (linux_390-64) ubuntu 16 (Marist) 0 alinux (Equinix or OSU)
Machine | Arch | OS | Site | Docker | Podman | cgroup | Build Specs |
---|---|---|---|---|---|---|---|
ub1604-x86-unb-01 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-02 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-03 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-04 | x64 | Ubuntu16 | UNB | 18.06.0-ce | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-05 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-06 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-07 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-08 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-09 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-10 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-11 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-12 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
ub1604-x86-unb-13 | x64 | Ubuntu16 | UNB | 19.03.5 | NI | linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs | |
deb10-x64-1 | x64 | Ubuntu16 | UNB | 18.09.1 | NI | linux_riscv64_cross | |
ub1606-p8-unb-01 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-02 | ppc64le | Ubuntu16 | UNB | 18.06.3-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-03 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-04 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-05 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-06 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-07 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-08 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-09 | ppc64le | Ubuntu16 | UNB | 18.06.3-ce | NI | linux_ppc-64_le_gcc | |
ub1606-p8-unb-10 | ppc64le | Ubuntu16 | UNB | 18.06.1-ce | NI | linux_ppc-64_le_gcc | |
eclipseomr-1 | ppc64le | Ubuntu16 | OSU | 18.09.7 | NI | linux_ppc-64_le_gcc | |
eclipseomr-2 | ppc64le | Ubuntu16 | OSU | NI | NI | linux_ppc-64_le_gcc | |
ub16-s390x-01 | s390x | Ubuntu16 | Marist | NI | NI | linux_390-64 | |
ub16-s390x-02 | s390x | Ubuntu16 | Marist | NI | NI | linux_390-64 |
NI: Not Installed
Machine | Configuration | Run build specs as-is | Run build specs within Docker |
---|---|---|---|
ub1604-x86-unb-01 to unb-07 | cgroup v1 | linux_aarch64, linux_arm | linux_ppc-64_le_gcc or linux_390-64 |
ub1604-x86-unb-08 to unb-13 | cgroup v2 | ~linux_x86-64_cmprssptrs (deprecated due to mixed refs)~ linux_x86 | linux_x86-64 |
ub1604-x86-unb-01 to unb-07
should already have cgroup v1 installed. So, no changes are required on these machines. But we should verify that these machines have cgroup v1 before making any changes.ub1604-x86-unb-08 to unb-13
to use cgroup v2 after verifying that these machines currently have cgroup v1.Docker
installed. It should be confirmed that Docker containers use cgroup v1 on ub1604-x86-unb-01 to unb-07
and cgroup v2 on ub1604-x86-unb-08 to unb-13
.Task 6
in https://github.com/eclipse/omr/issues/1281#issuecomment-1072796875. Otherwise, PR builds may encounter failures due to missing cgroup v2 support in OMR.linux_x86
and linux_x86-64
, within a Docker container by default. We will need an opt-out option for linux_x86
and linux_x86-64
, and an opt-in option for other eligible build specs.I think you have the required changes mentioned, although the sizing on updating the scripting to use containers might not be so minor. Estimating that will be ~2-3 days effort, depending on familiarity.
I do have a concern about the limited o/s coverage that the OMR jenkins has as options, especially since ubuntu16 is out of support and that is the bulk of the existing z/p/z coverage. I think it might be a good idea to have decided on what the OMR o/s coverage should be before proceeding too far down the path of system configuration.
If you had any choice of operating system coverage as the host, what would you prefer per architecture?
although the sizing on updating the scripting to use containers might not be so minor
https://www.jenkins.io/doc/book/pipeline/docker/ provides an easy mechanism to use Docker images as the execution environment.
what the OMR o/s coverage should be
OS requirement: it should support both cgroup v1 and v2. I think the latest Ubuntu LTS (20.04.4) supports both cgroup implementations.
Meanwhile, we should also consider updating Docker to 20.10 since it has better support for cgroup v2 and newer Linux operating systems: https://docs.docker.com/engine/release-notes/.
If you had any choice of operating system coverage as the host, what would you prefer per architecture?
Cgroup is only available on the Linux operating system. There is no preference for architecture until the required coverage is achieved: cgroup v1, cgroup v2, cgroup v1 in container and cgroup v2 in container. My current selection targets machines where we can run maximum number of build specs. This reduces the amount of infra work while achieving all cgroup coverage and avoiding addition of redundant PR builds.
Rather than dictating certain specs to be docker or not or c1 vs c2 would we be better off having those types of things as either inputs to the build launch or having multiple specs that have different setups depending on what the user wants? eg for linux_x86 xlinux xlinux_docker xlinux_c2
re https://github.com/eclipse/omr/issues/6468#issuecomment-1104555869: Rather than dictating certain specs to be docker or not or c1 vs c2 ...
This will correlate to the opt-out and opt-in options mentioned in https://github.com/eclipse/omr/issues/6468#issuecomment-1103468655 where the user has finer control on how to launch the builds.
But we will need a default configuration for jenkins build all
, which will ensure full cgroup coverage. In this scenario, we will have to specify how all build specs should be run in order to achieve full cgroup coverage.
@AdamBrousseau @jdekonin Cgroup v2 support is targeted for OpenJ9's 0.33 release. D-cut is end of May 22. Will we be able to complete these infra changes by then?
fyi @tajila @pshipton
I am not speaking for @AdamBrousseau as his time is his, but at this point in time, this is not on my radar due to other priorities. I suggest that you bring this up on the Eclipse OpenJ9 community call for additional awareness and prioritization, since this is where Adam and I take much of our workload and direction. https://www.eclipse.org/openj9/ (add to calendar at the bottom right of the page)
FYI @zl-wang and @joransiu to be aware of the cgroups v2 work, and weigh in wrt any testing or infra thoughts on p and z
as far as i know, RHEL9 (GA this month or next) will switch to v2 as the default at least on Power, while Ubuntu21.10 did that switch. Is it better that we don't do the artificial switch ourselves, but support it on its own due course? (so, maybe upgrade a few of the machines ... mostly Ubuntu anyway. then, we have a mixed testing infrastructure.).
Before that, it looks like needing to enable build/test in container on p/z first.
Before that, it looks like needing to enable build/test in container on p/z first.
I am looking into this. I have all the pieces to implement this with @AdamBrousseau's help. I will open a PR to update the PR build script soon.
Is it better that we don't do the artificial switch ourselves, but support it on its own due course? (so, maybe upgrade a few of the machines ... mostly Ubuntu anyway. then, we have a mixed testing infrastructure.)
That's the plan. It is described in https://github.com/eclipse/omr/issues/6468#issuecomment-1103468655. It's execution is documented in https://github.com/eclipse/omr/issues/6501#issuecomment-1140548746.
After the Eclipse foundation installed the required Docker plugins (https://github.com/eclipse-cbi/jiro/issues/214), I am able to run OMR PR builds inside a Docker container; see https://github.com/eclipse/omr/pull/6525.
I noticed that pLinux/zLinux machines (e.g. eclipseomr-2, ub16-s390x-02) do not have Docker installed:
@jdekonin @AdamBrousseau Can we make sure that all Linux machines (p/z/x/a archs) have Docker installed?
Docker has been installed on the two pLinux machines in the OMR build farm: eclipseomr-1
and eclipseomr-2
. Both have Ubuntu 20.04 and are configured as cgroup v1.
These changes are needed to fully verify/test the cgroup API in OMR.
There are two cgroup implementations available on Linux:
At a time, only one cgroup implementation should be used/enabled on Linux. Mixing the cgroup v1 and v2 controllers is not recommended.
We need to select Linux machines for enabling cgroup v2.
We also need to install container technologies on Linux machines because the behaviour of the cgroup API changes when running in a container.
High level plan
There are seven Linux specific PR build jobs in OMR. The below table describes how to modify the setup of the Linux machines for a specific PR build. For example, all machines for the linux_x86 PR build should have cgroup v1 enabled and docker installed, and the PR build should be run in a container environment.
The above table is just a recommendation. There is flexibility on distributing the above configurations across Linux PR builds until we get the needed cgroup v1/v2 and container coverage.
Other than the OMR cgroup API, all other OMR elements should be unaffected by variation in cgroup v1/v2 and containers.
Usage of container technologies
Docker Personal is freely available for use in open-source communities.
Even, Podman is open-source.
There should be no concerns in installing Docker and Podman on OMR machines.
Enabling cgroup v2
If grubby is not available, edit the
GRUB_CMDLINE_LINUX
line in/etc/default/grub
:The host needs to be rebooted. Once cgroup v2 is enabled, all containers created on the host should utilize cgroup v2. The presence of
/sys/fs/cgroup/cgroup.controllers
is an easy way to verify if cgroup v2 is enabled.Updates to the PR build script
Build script: https://github.com/eclipse/omr/blob/master/buildenv/jenkins/omrbuild.groovy
Tentative changes: