Open sxa opened 2 years ago
Two aspects of toolchains - there are the server systems that we use in order to run and maintain the processes and automation. The second is the tools used during the individual build/test processes to produce the output on those server systems.. In this comment I will start with the former.
Systems we use:
We use the repositories under https://github.com/adoptium to host our source code (https://github.com/adoptium/jdkXX[u]
), and most of our CI pipelines and scripting. Access to the repositories with code we use is controlled by the Eclipse Foundation and is restricted to committers and requires at least one approval from a committer before merging. GitHub is also used to store our binary artifacts (tar.gz, zip files, MSI and PKG installers) in the https://github.com/adoptium/temurinXX-binaries
repositories
We are currently running a server hosted at Hetzer running the latest LTS version of jenkins. This will be checked on (at a minimum) a weekly basis to ensure there are no security issues being flagged in the jenkins UI, or jenkins itself. At present the access control for Jenkins is performed using various groups in the AdoptOpenJDK GitHub organisation. Some of this was revamped as part of https://github.com/adoptium/infrastructure/issues/1084 NOTE: Information should be moved out of that issue and a plan to migrate to Adoptium-controlled groups for most of those.
Administrative access to the jenkins server is available to a subset of the Adoptium PMC and a small number of other users.
I'm putting these all together for now, but they can be split out later. In addition to publishing tar/zip files in GitHub, we also manage pushing the binaries to Homebrew (for MacOS), DockerHub (Ubuntu, CentOS7, Alpine and some windows distributions) as "official" images and also publish RPM and DEB installers for Linux to an Artifactory instance hosted by JFrog In all cases we do not have control over the servers, ubt we can control the keys used to publish to Homebrew and JFrog. At present we also still ship to the old dockerhub repository for AdoptOpenJDK which has automation keys as we push directly, but the newer Adoptium repository updates are done via pull requests to the official repositories, so no automation keys are used (See also the build image docker images later in this doc)
Most of our build and test machines are set up using the ansible playbooks from the infrastructure repository. We also have an AWX server that is access controlled by members of the infrastructure team from the AdoptOpenJDK repository (same ACL used for jenkins job access). In general if a user has been allowed administrative access to our build machines they will be granted access to AWX since that is simply an easier way to deploy playbook changes. The intention is to run the playbooks on a regular basis through AWX in order to ensure the machines are in sync and up to date.
Playbook changes are typically tested using the VPC and QPC jobs in jenkins to ensure they do not cause any problems when run from scratch on a 'clean' OS install - this runs via jenkins on some machines specifically set up for this purpose.
While the AWX server itself does not have an adoptium specific playbook to set it up, there is a guide at https://github.com/adoptium/infrastructure/wiki/Ansible-AWX and the process does make use of the ansible playbooks supplied with AWX.
In addition to the machines that are configured using the ansible playbooks, on some of our larger machines we run multiple container images on them to better utilise the capacity and provide isolation when running multiple tests. These are set up using the DockerStatic playbook role and are created using dockerfiles with the minimum requirements for running tests. This also lets us test on a wider variety of Linux distributions than we would be otherwise be able to with 'real' VMs. Currently this capability is limited to x64 and aarch64 but there is no reason other than capacity why it could not be rolled out more widely. The patching strategy for these are as implemented in https://github.com/adoptium/infrastructure/issues/2070
For some platforms (Alpine, Linux/x64, Linux/aarch64) we use docker images created from the dockerfiles in https://github.com/adoptium/infrastructure/tree/master/ansible/docker which are build and pushed up to dockerhub under the adoptopenjdk (NOTE: Not currently adoptium!) project. These are used for building on those platforms and are created using the ansible playbooks. Other platforms use statically created machines. Particularly for Linux/x64 this provides us with additional security since the OS we build on is currently CentOS6 which is out of formal support. Automation keys are used to push these images automatically to github when changes are made to the playbooks using the processes linked to in the FAQ
We use a bastillion server for distributing ssh keys to our build and tests systems. This server is generally not logged into by members of the infrastructure team (admin access to change the machine details is much more restricted) and containers each users' public keys and the appropriate groups that people are in to give them login access as the root
or jenkins
users to each of our build/test systems. This makes it easy to grant access to infrastructure team members to most of our non-windows machines. Some groups of machines, such as AIX, have separate ACLs with additional users as required.
The setup of the Bastillion server is described at https://github.com/adoptium/infrastructure/wiki/Bastillion
The Test Results Summary Service is a database and we front-end maintained for the purposes of archiving historic test results. This service currently runs as root on the machine but there is a plan to change that. The server is set up using the playbook at https://github.com/adoptium/infrastructure/blob/master/ansible/playbooks/AdoptOpenJDK_Unix_Playbook/trss.yml - for most purposes no access control is required to view the data on there, and the TRSS server gets its data from the jenkins jobs, which are not retained beyond a few days. Root access to this server is controlled by a custom authorized_keys
on the server.
We have a Nagios server that is configured to be able to monitor all of our build/tests machines and also publishes alerts into the #infrastructure-bot channel on slack. At the moment this server is not well-maintained and the intention is to migrate it to a newer one and set it up again with a useful set of rules and make the #infrastructure-bot channel useful for 'real' alerts that need to be dealt with. The overview of Nagios can be found at https://github.com/adoptium/infrastructure/wiki#nagios-monitoring
The above lists the servers and external services that are used for the purposes of maintaining the systems and pipelines used to produce the Eclipse Temurin binaries. As can be seen, some are fully under our control and some are not
We have backups of most of these which are stored on one of our servers on a regular basis with older backups being kept on a less frequent basis. This is done for Bastillion, Jenkins, Nagios and TRSS. Jenkins thin backups are also maintained on a remote mounted drive on the jenkins server itself.
flowchart
A[Playbooks] --> B
B[Ansible/AWX] --> E[playbook deploy]
C[Bastillion] --> E[ssh keys]
D[Nagios] --> E[Monitoring]
Z[GitHub source repos jdkXXu] --> E[Source]
Y[DockerHub] --> E[Build Containers]
E[Build and test systems] --> F
F[JENKINS] --> G
F --> Y
G[TRSS]
F --> H[Publish]
F --> I[Publish]
F --> J[Publish]
F --> K
H[GitHub temurinXX-binaries]
I[JFrog apt/yum repos]
J[Homebrew]
K[DockerHub]
The toolchains used for the build and test process are mostly defined by those upstream tools so are subject to change.
While we have the support for building for Temurin, OpenJ9, Bisheng, Dragonwell and Corretto in our build processes we will focus on Temurin for the purposes of this document. OpenJ9 has a number of extra requirements which are captured in the ansible playbooks, but typically these are additions to those required for Temurin and not not override them. This also does not go into details of the cross-compilation case which we use for some of the RISC-V builds which we do not currently release.
The jenkins pipelines in the ci-jenkins-pipelines repository are used to run the build and test processes. These select a machine to use and run the processes in the temurin-build and aqa-tests repositories in order to build and then test the product. For the current purposes of this document we will focus on the build side as that is the most relevant part from the perspective of securing the supply chain.
The build process is started from the make-adopt-build-farm.sh script and uses other scripts in the temurin-build repository. It clones the source code onto the machine and then builds it using the environment that exists on the build machine plus whatever settings are defined by the pipeline configurations - which is generally set up using ansible (or by running a docker container containing the build image which is pulled from dockerhub). The platform-specific-configurations scripts define, for each platform, some of the tool locations on our machines, such as the compiler versions (as installed by our ansible scripts) and some version or variant specific options.
The process, broadly speaking, is as follows:
build
tag and/or dockerBuild
as appropriate for the platform. In some cases these can be dynamically provisioned from the cloud providerstemurin-build
repository./make-adopt-build-farm.sh
which will select and display the version of a boot JDK specified in the machine configuration in jenkins as JDKxx_BOOT_DIR
or it can also autodetect the standard location on the build machinesmakejdk-any-platform.sh
to extract and build the source:
dev
branch of the repository.CUSTOM_CACERTS
is true, we build a custom cacerts bundle from Mozilla's certificates./configure
with an appropriate set of parameters including the ones to identify the build as Eclispe Temurin from Adoptium in the java -version output and vendor properties.make
is used to run the buildA more comprehensive list of tools which are installed from repositories can be seen in the Common playbook role.
Of the list at the end of the previous comment, these are the ones that can be (if not pulled from the OS repos) obtained from other locations. With some of the tools there is, of course, a risk that if they're put at the start of the path - as they are on AIX and maybe others - that anything installed by them could override system-provided tools of the same name (Deliberate for AIX to pick up make and others, but that could - and likely should - be worked around) ✅❌ | Product | OS | src/bin | Location | Version fixed? | Check? |
---|---|---|---|---|---|---|
Lots | Windows | Binary | cygwin.com | ❌ | ❌ [2] | |
Lots | MacOS | Binary | HomeBrew | ❌ | ❌ [5] | |
Lots | AIX | Binary | yum from IBM AIX Toolbox | ❌ | ❌[6] | |
Lots | Solaris | Binary | CSWpkgutil from opencsw.org | ❌ | ❌ [5] (link) | |
XLC | AIX | Binary | IBM Download or OSUOSL provided | ✅ | ✅[4] | |
Zulu boot JDK | Linux | Binary | Azul's CDN | ✅ | ❌[2] | |
JDK7 boot JDK | AIX | ? | Private /Vendor_Files |
✅ | ? [4] Provenance? | |
JDK7 boot JDK | Windows | Binary | java.net | ✅ | ✅[3] | |
git 2.15 | Linux | Source | Kernel.org | ✅ | ✅[3] | |
GNU make 4.1 | Linux | Source | ftp.gnu.org | ✅ | ❌[6] | |
curl 7.79.1 | UNIX | Source | github.com | ✅ | ✅[3] | |
GCC | Linux | source | One of the gnu.org mirrors then cached by us | ✅ | ✅[4] | |
autoconf | Linux | source | ftp.gnu.org | ✅ | ✅[3] | |
Docker | Linux | Binary | repo at docker.com, Fedora-EPEL, or unicamp.br (PPC) | ❌ | Varies? | |
Ant 1.10.5 | All | Binary | archive.apache.org | ✅ | ✅[3] | |
Ant-Contrib | All | Binary | sourceforge.net | ✅ | ✅[3] | |
Maven | All | Binary | downloads.apache.org | ✅ | ✅[3] | |
Python 2.7 | CentOS6 | Binary | Self built and cached (Issue) - Custom OpenSSL install too | ✅ | ✅[4] | |
Freetype | Win+Mac? | Source | local cache (download during build) | ✅ | ✅[3] | |
ALSA | Linux | Source | ftp.osuosl.org (download during build) | ✅ | ✅[3] |
[1] - Signed and being verified [2] - Signed and not currently verified but available [3] - Static checksum provided by site and encoded into playbooks [4] - Adopt cached version used [5] - No verification available and download can change [6] - No current checks
Other tools like Mercurial, OpenSSL, nasm, cmake, freemarker, NVidia CUDA toolkit (Links here are to the corresponding UNIX playbook roles) are installed on the machines used for things other than Temurin builds
Part of https://github.com/adoptium/adoptium/issues/122:
PO.3.2 Follow recommended security practices to deploy, operate, and maintain tools and toolchains