Ansible request for OpenJDK container detection tests (Linux x86_64)

jerboaa commented 1 year ago

It would be very useful to have specific test machines for testing the OpenJDK container detection code. See https://github.com/adoptium/aqa-tests/pull/4147 and https://github.com/adoptium/aqa-tests/issues/4143

Delete as appropriate from this list:

Request for new playbook addition

Details:

Container tests in OpenJDK require the following setup (in addition to the usual setup for aqavit test machines on Linux):

Installed container engine (docker or podman) with SELinux or similar turned off
cgroups v1 with swap accounting turned on (swapaccount=1 kernel command line).
cgroups v2 setup based on this ansible playbook which sets up cgroups v2 suitable for testing. For example RHEL 9 machines would come with cg v2 by default.

Ideally, there'd be both, test machines with cg v1 and cg v2 so that testing can be run on both systems. Suggested labels would be openjdk.dev.container.cg1 and openjdk.dev.container.cg2.

sendaoYan commented 1 year ago

https://github.com/adoptium/aqa-tests/issues/3629#issuecomment-1313696457

jerboaa commented 1 year ago

@sxa Do you have any guestimate whether or not this could be done and if so when? Happy to help get this implemented.

sxa commented 1 year ago

Not as yet - what is the priority on it?

Questions that would need answered:

How to decide if a given machine should be CG1 or CG2 when setting it up - in general we set up all machines of a given type OS the same way at present and that would have to change
How are the test suites going to be modified to accommodate this? Would it require a new test type and associated jobs if the jobs had to be run across both CG types?

If you don't REQUIRE a specific CG setup to test this initially then they could be run on the existing ci.role.test&&sw.tool.docker as a first pass (same as the external tests)

jerboaa commented 1 year ago

Not as yet - what is the priority on it?

Not sure. Long term it would be good to have this test infra, though.

Questions that would need answered:

* How to decide if a given machine should be CG1 or CG2 when setting it up - in general we set up all machines of a given type OS the same way at present and that would have to change

Yes, unfortunately that would have to change for good coverage. How to decide? For one, the system needs to support it. For two, there need to be at least 1 configured as cg 1 and one for cg 2. As for support: Fedora 36+ have cgroups v2 support. I believe latest Ubuntu comes with cgroups v2 by default as well. RHEL 8 supports it via the systemd.unified_cgroup_hierarchy=1 flag. The default for RHEL 9 is cgroups v2. However, the point for container testing on cgroups v2 is that it needs some explicit config for it to work well. See the ansible script I've done a while back.

* How are the test suites going to be modified to accommodate this? Would it require a new test type and associated jobs if the jobs had to be run across both CG types?

I didn't cross that bridge yet. https://github.com/adoptium/aqa-tests/issues/4143 added container tests group which could have different variations based on the systems they run on.

If you don't REQUIRE a specific CG setup to test this initially then they could be run on the existing ci.role.test&&sw.tool.docker as a first pass (same as the external tests)

Thanks. I believe that's been done already, but there is no coverage for cgroups v2 and/or swapaccount=0. At least not knowingly so.

sxa commented 1 year ago

I wonder if we could just let each OS use it's default (configured as desired) and label appropriately? Looks like even back to Ubuntu 18.04 it was using cgroup2 based on the second line here (I believe this is a valid check!):

$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)

So if I'm understanding where we are, we can run the test on some of the systems, but at the moment many machines are not configured appropriately. At the moment I think most of our systems capable of running docker images are Ubuntu so we'd need to validate that your playbook will work on Ubuntu then (assuming there are no side effects or security concerns with those changes) we can look at deploying that without our standard playbooks on the machines that have CG2 by default. Just to be clear, the tests that exercise this functionality are all currently in the dev.openjdk suite right?

jerboaa commented 1 year ago

I wonder if we could just let each OS use it's default (configured as desired) and label appropriately? Looks like even back to Ubuntu 18.04 it was using cgroup2 based on the second line here (I believe this is a valid check!):
$ mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)

I'm not sure this is a correct test. It looks like a hybrid system to me which usually presents itself as cgv1. What does this return?

stat -c "%T" -f /sys/fs/cgroup

On cgv2 it returns cgroup2fs on cgv1 it returns tmpfs (usually).

So if I'm understanding where we are, we can run the test on some of the systems, but at the moment many machines are not configured appropriately. At the moment I think most of our systems capable of running docker images are Ubuntu so we'd need to validate that your playbook will work on Ubuntu then (assuming there are no side effects or security concerns with those changes) we can look at deploying that without our standard playbooks on the machines that have CG2 by default.

Sounds OK for an initial step to get proper cg v2 configured for the tests to work. If we could label systems which have swapaccount=0 or the like as well it would be helpful too.

Just to be clear, the tests that exercise this functionality are all currently in the dev.openjdk suite right?

Yes.

adoptium / infrastructure

Ansible request for OpenJDK container detection tests (Linux x86_64) #2817