adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

Evaluate job restrictions plugin to improve build/test/(trestle?) machine isolation #3379

Open sxa opened 8 months ago

sxa commented 8 months ago

Part of SSDF phase 3 PO 5.2

We installed the job restrictions plugin last year. This issue will cover testing it out and seeing if we can prevent jobs run by the test-triage team from being scheduled on production build machines. This should reduce the risks of any security issues on those machines. While we have improved isolation on Linux through the use of containerised systems, there is still a risk elsewhere, or if test jobs are scheuled on systems used for hosting build jobs.

sxa commented 4 months ago

Initial tests have not been fruitful - may be an issue with the expressions but not clear what the issues are.

sxa commented 2 weeks ago

Tried this again. I have tested this on one of the machines and it successfully restricts execution of anything other than build jobs on the machine. I will look at adding this criteria onto the AIX, ppc64le and windows x64 dockerhost_* and build_* machines for further testing (Doing it on Windows/x64 will also enhance our testing on ephemeral machines since there will be fewer real ones available!) This should give us good coverage in terms of seeing if there are any problems, although there may be cases where some jobs can run on "any" machine with e.g. build&&linux which we won't pick up with this testing (I've allowed centos7_docker_image_updater on the ppc64le ones for now but there may well be others) which should be shown up with these tests.

image

This has been applied to the following

I've kicked off https://ci.adoptium.net/job/build-scripts/job/openjdk11-pipeline/2723 to see how it goes and where the jobs end up. Individual build pipelines are as follows:

If a build fails to get scheduled you'll see something like this:

15:44:59  LABEL: docker-osuosl-ubuntu2004-ppc64le-1
[Pipeline] stage
[Pipeline] { (Queue)
[Pipeline] nodesByLabel
15:44:59  Found a total of 1 nodes with the 'docker-osuosl-ubuntu2004-ppc64le-1' label
[Pipeline] echo
15:44:59  dynamicAgents: [fyre]
[Pipeline] node
15:45:14  Still waiting to schedule task
[...]
15:45:14  ‘build-marist-rhel8-s390x-1’ doesn’t have label ‘docker-osuosl-ubuntu2004-ppc64le-1’
[...]

And then no further progress

sxa commented 2 weeks ago

Other jobs that may be affected: