Certbot snap multiarch build

General approach

This PR proposes and approach to build the certbot snap for several various architectures in an automated manner: it allows to build the snap for amd64, arm64 and armhf in Travis CI within 20 minutes.

The general approach is to use QEMU in order to be able to cross-build several foreign architectures onto one given building machine environment (typically a amd64-based VM for CI systems), thanks to a dockerized QEMU-enabled snapcraft instance. The process I propose here also uses a local pypi server to speed up the build, by offering architecture specific wheels that are not available on public PyPI servers (here cryptography and cffi).

Design considerations

I will now discuss why I chose this strategy among all others I could think of.

Let's clearify something first. Snapcraft offers a very flexible approach to describe a multiarch build in snapcraft.yaml. Based on build-on and run-on keywords, one can define a complete matrix of various architectures for the buildtime and the runtime. However, it took me some time to realize that what snapcraft proposes here is a framework for multiarch builds (what to build, when, how), but not a cross-compilation solution. The ability to cross-compile is the responsability of the plugin used: for instance, Go compiler is designed at its heart for cross-compilation, so it is quite natural that the go plugin interfaces itself with the multiarch build matrix described by snapcraft to offer a seamless cross-compilation experience for Go-based snaps.

But it is not the case for the python plugin, and then gives us only one possibility: snapcraft must run on the target architecture that we want to build the snap. And this precisely give us two approaches:

orchestrate native machines for each architecture,
go for the slow emulation approach with QEMU

Here are the constraints/goals I took into account to design the solution: 1) build the snap for several target architectures (obviously ...) 2) build it by cross-compilation if necessary, to not rely on a native build (because most CIs are running on amd64) 3) use a process suitable both for local build and CI build (because troubleshooting is always necessary, and troubleshooting is done easier locally) 4) use a process not tied to a technology available in a specific CI (to be able to migrate to other solutions freely) 5) build in an acceptable amount of time (eg. 5 min is ok, one hour is not)

Constraint 2) implies a QEMU approach. Travis CI proposes an all-inclusive solution for 2), but it is in alpha stage, and it does not match the constraint 4).

Here are all standard ways of building snaps I know of, that can be run locally or on a CI (constraint 3)) a) Use snapcraft installed as a snap, and build with LXC b) Use snapcraft installed as a snap, and build with Multipass c) Use snapcraft installed as a snap, and build locally d) Use the dockerized snapcraft, and build locally inside the docker instance

Among all of them, only a) and d) are ok for a QEMU approach. Multipass on its own does not emulate, it is "just" a wrapper that leverages a virtualization technique available on the current host. Nonetheless, Multipass is not possible in most CI I could test, because the VMs used in them are not exposing the virtualization capabilities in their VCPUs.

I evicted a) for the following reasons:

using QEMU is not officially supported by LXC maintainers
doing so requires to configure manually the LXC container, so we loose the benefits of snapcraft doing it for us: at this point if it is about doing it manually, I prefer to use Docker that have a wider usage and support than LXC, and QEMU is better supported and documented for Docker.

Finally, d) is selected.

Implementation details

First I took the official Dockerfile for snapcraft, and applied to it the standard configuration to have a cross-platform emulated Docker with QEMU. This gives us a "native" image of snapcraft for the target architecture, capable to run on any architecture. This build is handled by snapcraft/build.sh [TARGET_ARCH] script.

Second, QEMU static binaries are enabled on the host machine that will build the snap, then the snapcraft docker is started and build the snap accordingly to definition from snap/snapcraft.yaml. This build is handled by ./build.sh [TARGET_ARCH] script.

At the end you get the snap file locally, and Travis CI leverages the TARGET_ARCH environment variable to get a build matrix and jobs in parallel for all target architectures.

Optional optimizations

QEMU is damn slow, because it essentially emulates the foreign CPU instructions into the local native ones. You can expect an emulated build to be at least 4 times slower than a native one, and way more if the process consists mostly in CPU intensive tasks, like compiling.

Luckily almost all certbot python dependencies are either pure python packages (so no C compilation is required) or the wheels for each architecture are available. At the notable exception of cryptography and cffi, for which no wheels are available for arm64 and armhf in a given repo (PyPi or third parties). On QEMU emulated environment, cryptography takes 10 min to build and cffi 5 min with Travis CI VMs. Before snapcraft 3.10, the wheels were built for each snap part where they are required, leading to a unsustainable 90 min long build. With snapcraft 3.10+ it is better since dependencies are reused from previous parts, but it still represents a significant amount of useless build time, since these dependencies do not change a lot, and pre-built wheels could be reused largely over each CI job.

So I use a non-invasive approach that allows to get these pre-built wheels locally, during the build. It is done by exposing a local PyPI server, and instructing the build process to use it as an extra index for pip.

This way we go from 35 min of build to 20 min. Two interestings things with this design:

it is entirely optional: the way it is implemented does not require to make any change in the snapcraft.yaml or even the Dockerfiles. It is simply exposed as an environment variable to docker in the build.sh script. Building the wheels without it is totally possible with standard docker commands.
its failure is not critical for the build success: if the dependencies change in certbot (say 2.9 for cryptography), and the local PyPI server does not have the wheel, pip will simply fallback to build itself the wheel, and the build will continue. It will simply be slower, until the new pre-built wheels are pushed to the repo.

basak / certbot-snap-build

Certbot snap multiarch build #27