adoptium / infrastructure

This repo contains all information about machine maintenance.
Apache License 2.0
85 stars 101 forks source link

SSDF PO.3.2 Documentation on maintainance and security of toolchains #2553

Open sxa opened 2 years ago

sxa commented 2 years ago

Part of https://github.com/adoptium/adoptium/issues/122:

PO.3.2 Follow recommended security practices to deploy, operate, and maintain tools and toolchains

sxa commented 2 years ago

Two aspects of toolchains - there are the server systems that we use in order to run and maintain the processes and automation. The second is the tools used during the individual build/test processes to produce the output on those server systems.. In this comment I will start with the former.

Systems we use:

GitHub

We use the repositories under https://github.com/adoptium to host our source code (https://github.com/adoptium/jdkXX[u]), and most of our CI pipelines and scripting. Access to the repositories with code we use is controlled by the Eclipse Foundation and is restricted to committers and requires at least one approval from a committer before merging. GitHub is also used to store our binary artifacts (tar.gz, zip files, MSI and PKG installers) in the https://github.com/adoptium/temurinXX-binaries repositories

Jenkins

We are currently running a server hosted at Hetzer running the latest LTS version of jenkins. This will be checked on (at a minimum) a weekly basis to ensure there are no security issues being flagged in the jenkins UI, or jenkins itself. At present the access control for Jenkins is performed using various groups in the AdoptOpenJDK GitHub organisation. Some of this was revamped as part of https://github.com/adoptium/infrastructure/issues/1084 NOTE: Information should be moved out of that issue and a plan to migrate to Adoptium-controlled groups for most of those.

Administrative access to the jenkins server is available to a subset of the Adoptium PMC and a small number of other users.

Homebrew/DockerHub/JFrog

I'm putting these all together for now, but they can be split out later. In addition to publishing tar/zip files in GitHub, we also manage pushing the binaries to Homebrew (for MacOS), DockerHub (Ubuntu, CentOS7, Alpine and some windows distributions) as "official" images and also publish RPM and DEB installers for Linux to an Artifactory instance hosted by JFrog In all cases we do not have control over the servers, ubt we can control the keys used to publish to Homebrew and JFrog. At present we also still ship to the old dockerhub repository for AdoptOpenJDK which has automation keys as we push directly, but the newer Adoptium repository updates are done via pull requests to the official repositories, so no automation keys are used (See also the build image docker images later in this doc)

Ansible/AWX

Most of our build and test machines are set up using the ansible playbooks from the infrastructure repository. We also have an AWX server that is access controlled by members of the infrastructure team from the AdoptOpenJDK repository (same ACL used for jenkins job access). In general if a user has been allowed administrative access to our build machines they will be granted access to AWX since that is simply an easier way to deploy playbook changes. The intention is to run the playbooks on a regular basis through AWX in order to ensure the machines are in sync and up to date.

Playbook changes are typically tested using the VPC and QPC jobs in jenkins to ensure they do not cause any problems when run from scratch on a 'clean' OS install - this runs via jenkins on some machines specifically set up for this purpose.

While the AWX server itself does not have an adoptium specific playbook to set it up, there is a guide at https://github.com/adoptium/infrastructure/wiki/Ansible-AWX and the process does make use of the ansible playbooks supplied with AWX.

Static Docker containers

In addition to the machines that are configured using the ansible playbooks, on some of our larger machines we run multiple container images on them to better utilise the capacity and provide isolation when running multiple tests. These are set up using the DockerStatic playbook role and are created using dockerfiles with the minimum requirements for running tests. This also lets us test on a wider variety of Linux distributions than we would be otherwise be able to with 'real' VMs. Currently this capability is limited to x64 and aarch64 but there is no reason other than capacity why it could not be rolled out more widely. The patching strategy for these are as implemented in https://github.com/adoptium/infrastructure/issues/2070

Dockerhub (for build images)

For some platforms (Alpine, Linux/x64, Linux/aarch64) we use docker images created from the dockerfiles in https://github.com/adoptium/infrastructure/tree/master/ansible/docker which are build and pushed up to dockerhub under the adoptopenjdk (NOTE: Not currently adoptium!) project. These are used for building on those platforms and are created using the ansible playbooks. Other platforms use statically created machines. Particularly for Linux/x64 this provides us with additional security since the OS we build on is currently CentOS6 which is out of formal support. Automation keys are used to push these images automatically to github when changes are made to the playbooks using the processes linked to in the FAQ

Bastillion

We use a bastillion server for distributing ssh keys to our build and tests systems. This server is generally not logged into by members of the infrastructure team (admin access to change the machine details is much more restricted) and containers each users' public keys and the appropriate groups that people are in to give them login access as the root or jenkins users to each of our build/test systems. This makes it easy to grant access to infrastructure team members to most of our non-windows machines. Some groups of machines, such as AIX, have separate ACLs with additional users as required.

The setup of the Bastillion server is described at https://github.com/adoptium/infrastructure/wiki/Bastillion

TRSS

The Test Results Summary Service is a database and we front-end maintained for the purposes of archiving historic test results. This service currently runs as root on the machine but there is a plan to change that. The server is set up using the playbook at https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/trss.yml - for most purposes no access control is required to view the data on there, and the TRSS server gets its data from the jenkins jobs, which are not retained beyond a few days. Root access to this server is controlled by a custom authorized_keys on the server.

Nagios

We have a Nagios server that is configured to be able to monitor all of our build/tests machines and also publishes alerts into the #infrastructure-bot channel on slack. At the moment this server is not well-maintained and the intention is to migrate it to a newer one and set it up again with a useful set of rules and make the #infrastructure-bot channel useful for 'real' alerts that need to be dealt with. The overview of Nagios can be found at https://github.com/adoptium/infrastructure/wiki#nagios-monitoring

Summary

The above lists the servers and external services that are used for the purposes of maintaining the systems and pipelines used to produce the Eclipse Temurin binaries. As can be seen, some are fully under our control and some are not

We have backups of most of these which are stored on one of our servers on a regular basis with older backups being kept on a less frequent basis. This is done for Bastillion, Jenkins, Nagios and TRSS. Jenkins thin backups are also maintained on a remote mounted drive on the jenkins server itself.

flowchart
A[Playbooks] --> B
B[Ansible/AWX] --> E[playbook deploy]
C[Bastillion] --> E[ssh keys]
D[Nagios] --> E[Monitoring]
Z[GitHub source repos jdkXXu] --> E[Source]
Y[DockerHub] --> E[Build Containers]
E[Build and test systems] --> F
F[JENKINS] --> G
F --> Y
G[TRSS]
F --> H[Publish]
F --> I[Publish]
F --> J[Publish]
F --> K
H[GitHub temurinXX-binaries]
I[JFrog apt/yum repos]
J[Homebrew]
K[DockerHub]
sxa commented 2 years ago

The toolchains used for the build and test process are mostly defined by those upstream tools so are subject to change.

OpenJDK Builds

While we have the support for building for Temurin, OpenJ9, Bisheng, Dragonwell and Corretto in our build processes we will focus on Temurin for the purposes of this document. OpenJ9 has a number of extra requirements which are captured in the ansible playbooks, but typically these are additions to those required for Temurin and not not override them. This also does not go into details of the cross-compilation case which we use for some of the RISC-V builds which we do not currently release.

Machine setup (ansible)

Jenkins pipelines

The jenkins pipelines in the ci-jenkins-pipelines repository are used to run the build and test processes. These select a machine to use and run the processes in the temurin-build and aqa-tests repositories in order to build and then test the product. For the current purposes of this document we will focus on the build side as that is the most relevant part from the perspective of securing the supply chain.

Build process

The build process is started from the make-adopt-build-farm.sh script and uses other scripts in the temurin-build repository. It clones the source code onto the machine and then builds it using the environment that exists on the build machine plus whatever settings are defined by the pipeline configurations - which is generally set up using ansible (or by running a docker container containing the build image which is pulled from dockerhub). The platform-specific-configurations scripts define, for each platform, some of the tool locations on our machines, such as the compiler versions (as installed by our ansible scripts) and some version or variant specific options.

The process, broadly speaking, is as follows:

Tools used

A more comprehensive list of tools which are installed from repositories can be seen in the Common playbook role.

sxa commented 2 years ago
Of the list at the end of the previous comment, these are the ones that can be (if not pulled from the OS repos) obtained from other locations. With some of the tools there is, of course, a risk that if they're put at the start of the path - as they are on AIX and maybe others - that anything installed by them could override system-provided tools of the same name (Deliberate for AIX to pick up make and others, but that could - and likely should - be worked around) ✅❌ Product OS src/bin Location Version fixed? Check?
Lots Windows Binary cygwin.com ❌ [2]
Lots MacOS Binary HomeBrew ❌ [5]
Lots AIX Binary yum from IBM AIX Toolbox ❌[6]
Lots Solaris Binary CSWpkgutil from opencsw.org ❌ [5] (link)
XLC AIX Binary IBM Download or OSUOSL provided ✅[4]
Zulu boot JDK Linux Binary Azul's CDN ❌[2]
JDK7 boot JDK AIX ? Private /Vendor_Files ? [4] Provenance?
JDK7 boot JDK Windows Binary java.net ✅[3]
git 2.15 Linux Source Kernel.org ✅[3]
GNU make 4.1 Linux Source ftp.gnu.org ❌[6]
curl 7.79.1 UNIX Source github.com ✅[3]
GCC Linux source One of the gnu.org mirrors then cached by us ✅[4]
autoconf Linux source ftp.gnu.org ✅[3]
Docker Linux Binary repo at docker.com, Fedora-EPEL, or unicamp.br (PPC) Varies?
Ant 1.10.5 All Binary archive.apache.org ✅[3]
Ant-Contrib All Binary sourceforge.net ✅[3]
Maven All Binary downloads.apache.org ✅[3]
Python 2.7 CentOS6 Binary Self built and cached (Issue) - Custom OpenSSL install too ✅[4]
Freetype Win+Mac? Source local cache (download during build) ✅[3]
ALSA Linux Source ftp.osuosl.org (download during build) ✅[3]

[1] - Signed and being verified [2] - Signed and not currently verified but available [3] - Static checksum provided by site and encoded into playbooks [4] - Adopt cached version used [5] - No verification available and download can change [6] - No current checks

Other tools like Mercurial, OpenSSL, nasm, cmake, freemarker, NVidia CUDA toolkit (Links here are to the corresponding UNIX playbook roles) are installed on the machines used for things other than Temurin builds