eclipse-omr / omr

Eclipse OMR™ Cross platform components for building reliable, high performance language runtimes
http://www.eclipse.org/omr
Other
950 stars 397 forks source link

Infra changes for cgroup support #6468

Open babsingh opened 2 years ago

babsingh commented 2 years ago

These changes are needed to fully verify/test the cgroup API in OMR.

There are two cgroup implementations available on Linux:

At a time, only one cgroup implementation should be used/enabled on Linux. Mixing the cgroup v1 and v2 controllers is not recommended.

We need to select Linux machines for enabling cgroup v2.

We also need to install container technologies on Linux machines because the behaviour of the cgroup API changes when running in a container.

High level plan

There are seven Linux specific PR build jobs in OMR. The below table describes how to modify the setup of the Linux machines for a specific PR build. For example, all machines for the linux_x86 PR build should have cgroup v1 enabled and docker installed, and the PR build should be run in a container environment.

PR build cgroup version container technology build/test in container
linux_x86 v1 docker yes
linux_x86-64 v2 docker yes
linux_arm v1 podman yes
linux_aarch64 v2 podman yes
linux_ppc-64_le_gcc v1 none no
linux_390-64 v2 none no
linux_riscv64_cross v1 or v2 none no

The above table is just a recommendation. There is flexibility on distributing the above configurations across Linux PR builds until we get the needed cgroup v1/v2 and container coverage.

Other than the OMR cgroup API, all other OMR elements should be unaffected by variation in cgroup v1/v2 and containers.

Usage of container technologies

Docker Personal is freely available for use in open-source communities.

Even, Podman is open-source.

There should be no concerns in installing Docker and Podman on OMR machines.

Enabling cgroup v2

 sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=1"

If grubby is not available, edit the GRUB_CMDLINE_LINUX line in /etc/default/grub:

GRUB_CMDLINE_LINUX="systemd.unified_cgroup_hierarchy=1"

The host needs to be rebooted. Once cgroup v2 is enabled, all containers created on the host should utilize cgroup v2. The presence of /sys/fs/cgroup/cgroup.controllers is an easy way to verify if cgroup v2 is enabled.

Updates to the PR build script

Build script: https://github.com/eclipse/omr/blob/master/buildenv/jenkins/omrbuild.groovy

Tentative changes:

babsingh commented 2 years ago

@AdamBrousseau @jdekonin Can you please provide feedback on this issue? Is it feasible? Any concerns about the approach?

fyi @0xdaryl @EricYangIBM

AdamBrousseau commented 2 years ago

We currently have 12 plinux (linux_ppc-64_le_gcc) ubuntu 16 (10 UNB & 2 OSU) 13 xlinux (linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs) ubuntu 16 (UNB) 1 xlinux (linux_riscv64_cross) debian 10 (UNB) 2 zlinux (linux_390-64) ubuntu 16 (Marist) 0 alinux (Equinix or OSU)

AdamBrousseau commented 2 years ago
Machine Arch OS Site Docker Podman cgroup Build Specs
ub1604-x86-unb-01 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-02 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-03 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-04 x64 Ubuntu16 UNB 18.06.0-ce NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-05 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-06 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-07 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-08 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-09 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-10 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-11 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-12 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
ub1604-x86-unb-13 x64 Ubuntu16 UNB 19.03.5 NI linux_aarch64, linux_arm, linux_x86, linux_x86-64, linux_x86-64_cmprssptrs
deb10-x64-1 x64 Ubuntu16 UNB 18.09.1 NI linux_riscv64_cross
ub1606-p8-unb-01 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-02 ppc64le Ubuntu16 UNB 18.06.3-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-03 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-04 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-05 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-06 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-07 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-08 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-09 ppc64le Ubuntu16 UNB 18.06.3-ce NI linux_ppc-64_le_gcc
ub1606-p8-unb-10 ppc64le Ubuntu16 UNB 18.06.1-ce NI linux_ppc-64_le_gcc
eclipseomr-1 ppc64le Ubuntu16 OSU 18.09.7 NI linux_ppc-64_le_gcc
eclipseomr-2 ppc64le Ubuntu16 OSU NI NI linux_ppc-64_le_gcc
ub16-s390x-01 s390x Ubuntu16 Marist NI NI linux_390-64
ub16-s390x-02 s390x Ubuntu16 Marist NI NI linux_390-64

NI: Not Installed

babsingh commented 2 years ago

Suggested Setup

Machine Configuration Run build specs as-is Run build specs within Docker
ub1604-x86-unb-01 to unb-07 cgroup v1 linux_aarch64, linux_arm linux_ppc-64_le_gcc or linux_390-64
ub1604-x86-unb-08 to unb-13 cgroup v2 ~linux_x86-64_cmprssptrs (deprecated due to mixed refs)~ linux_x86 linux_x86-64
jdekonin commented 2 years ago

I think you have the required changes mentioned, although the sizing on updating the scripting to use containers might not be so minor. Estimating that will be ~2-3 days effort, depending on familiarity.

I do have a concern about the limited o/s coverage that the OMR jenkins has as options, especially since ubuntu16 is out of support and that is the bulk of the existing z/p/z coverage. I think it might be a good idea to have decided on what the OMR o/s coverage should be before proceeding too far down the path of system configuration.

If you had any choice of operating system coverage as the host, what would you prefer per architecture?

babsingh commented 2 years ago

although the sizing on updating the scripting to use containers might not be so minor

https://www.jenkins.io/doc/book/pipeline/docker/ provides an easy mechanism to use Docker images as the execution environment.

what the OMR o/s coverage should be

OS requirement: it should support both cgroup v1 and v2. I think the latest Ubuntu LTS (20.04.4) supports both cgroup implementations.

Meanwhile, we should also consider updating Docker to 20.10 since it has better support for cgroup v2 and newer Linux operating systems: https://docs.docker.com/engine/release-notes/.

If you had any choice of operating system coverage as the host, what would you prefer per architecture?

Cgroup is only available on the Linux operating system. There is no preference for architecture until the required coverage is achieved: cgroup v1, cgroup v2, cgroup v1 in container and cgroup v2 in container. My current selection targets machines where we can run maximum number of build specs. This reduces the amount of infra work while achieving all cgroup coverage and avoiding addition of redundant PR builds.

AdamBrousseau commented 2 years ago

Rather than dictating certain specs to be docker or not or c1 vs c2 would we be better off having those types of things as either inputs to the build launch or having multiple specs that have different setups depending on what the user wants? eg for linux_x86 xlinux xlinux_docker xlinux_c2

babsingh commented 2 years ago

re https://github.com/eclipse/omr/issues/6468#issuecomment-1104555869: Rather than dictating certain specs to be docker or not or c1 vs c2 ...

This will correlate to the opt-out and opt-in options mentioned in https://github.com/eclipse/omr/issues/6468#issuecomment-1103468655 where the user has finer control on how to launch the builds.

But we will need a default configuration for jenkins build all, which will ensure full cgroup coverage. In this scenario, we will have to specify how all build specs should be run in order to achieve full cgroup coverage.

babsingh commented 2 years ago

@AdamBrousseau @jdekonin Cgroup v2 support is targeted for OpenJ9's 0.33 release. D-cut is end of May 22. Will we be able to complete these infra changes by then?

fyi @tajila @pshipton

jdekonin commented 2 years ago

I am not speaking for @AdamBrousseau as his time is his, but at this point in time, this is not on my radar due to other priorities. I suggest that you bring this up on the Eclipse OpenJ9 community call for additional awareness and prioritization, since this is where Adam and I take much of our workload and direction. https://www.eclipse.org/openj9/ (add to calendar at the bottom right of the page)

vijaysun-omr commented 2 years ago

FYI @zl-wang and @joransiu to be aware of the cgroups v2 work, and weigh in wrt any testing or infra thoughts on p and z

zl-wang commented 2 years ago

as far as i know, RHEL9 (GA this month or next) will switch to v2 as the default at least on Power, while Ubuntu21.10 did that switch. Is it better that we don't do the artificial switch ourselves, but support it on its own due course? (so, maybe upgrade a few of the machines ... mostly Ubuntu anyway. then, we have a mixed testing infrastructure.).

Before that, it looks like needing to enable build/test in container on p/z first.

babsingh commented 2 years ago

Before that, it looks like needing to enable build/test in container on p/z first.

I am looking into this. I have all the pieces to implement this with @AdamBrousseau's help. I will open a PR to update the PR build script soon.

Is it better that we don't do the artificial switch ourselves, but support it on its own due course? (so, maybe upgrade a few of the machines ... mostly Ubuntu anyway. then, we have a mixed testing infrastructure.)

That's the plan. It is described in https://github.com/eclipse/omr/issues/6468#issuecomment-1103468655. It's execution is documented in https://github.com/eclipse/omr/issues/6501#issuecomment-1140548746.

babsingh commented 2 years ago

After the Eclipse foundation installed the required Docker plugins (https://github.com/eclipse-cbi/jiro/issues/214), I am able to run OMR PR builds inside a Docker container; see https://github.com/eclipse/omr/pull/6525.

I noticed that pLinux/zLinux machines (e.g. eclipseomr-2, ub16-s390x-02) do not have Docker installed:

@jdekonin @AdamBrousseau Can we make sure that all Linux machines (p/z/x/a archs) have Docker installed?

babsingh commented 2 years ago

Docker has been installed on the two pLinux machines in the OMR build farm: eclipseomr-1 and eclipseomr-2. Both have Ubuntu 20.04 and are configured as cgroup v1.