Open ishitatsuyuki opened 6 years ago
Thanks a lot for this suggestion! As mentioned on IRC, we include sources and pre-compiled artifacts in Docker images because it has the following advantages:
ninja -C out/Default
in a fresh Chrome container, it will even say "no work to do").Having things cloned beforehand is good, but writing to them isn't. There's notable overhead when writing to a CoW overlay, compared to Docker volumes.
Interesting, thanks for this insight. Could you please elaborate on how significant this overhead is? Is there a particularly write-intensive workflow that you find too slow on Janitor today because of this?
clone the source and artifacts to a different directory, then copy them on startup with a entrypoint script.
This would instantly remove the benefits of Copy-on-Write, by having every container store 100% of its source files and pre-compiled binaries separately from every other container, right? (So if we have a 10GB checkout with 10GB pre-compiled binaries in a Docker image, 100 new containers would instantly fill 2TB of disk space)
do not do source related work in Dockerfiles, instead have a separate step instead. (
docker build
doesn't handle volumes yet and we can't expect it in near future either)
I have the intuition that we'll have to somehow remove the "workspace" from our current Docker containers. Because old containers keep large old images from being garbage collected, we want to delete containers as soon as possible, and if we're able to extract 100% of the "valuable state" from a given container (i.e. any user-made changes like configurations, commits, branches, uncommitted work/source file changes) then we can delete a container and restore it at will, freeing up a lot of resources on our infrastructure.
However, I have no idea what the best design for such a "removed workspace" or "extracted valuable state" would look like. E.g. a mounted volume? A database of user-made changes? A private VCS branch along with uncommitted changes that can be restored at will?
To me, the most important aspects of such a thing would be:
I dug a bit for userspace level dedup with git clone --reference
.
While this works for submodules very well (just reference a recursive non-bare clone and you can make a clone instantaneously), most projects currently in Janitor either doesn't use git (Firefox) or use a home-grown monorepo tool, making integration hard.
Another obstacle is compilation artifacts. ccache supports only one cache directory, which means we can't make some read-only fallback.
Here is a benchmark that @ishitatsuyuki made to prove his point:
docker run -it --rm -v /tmp:/mnt janx/thunderbird /bin/bash
cp -ar $PWD /mnt/thunderbird
find . > /dev/null
find . > /dev/null
time ./mozilla/mach clobber
cd /mnt/thunderbird/
find . > /dev/null
find . > /dev/null
time ./mozilla/mach clobber
Results:
13:14:20 ishitatsuyuki> 4m59s on CoW 13:15:20 ishitatsuyuki> 24s on volume 13:15:33 ishitatsuyuki> So yes, roughly 10x overhead
I have personally settled to generating warming up volumes beforehand, so we don't need to wait them to be copied when a container is created. We can have a configurable size of volume pool to achieve this.
Thanks a lot for updating your plan, and for seeking valuable performance improvements! Here are a few personal thoughts on this update.
Having things cloned beforehand is good, but writing to them isn't. There's notable overhead when writing to a CoW overlay, compared to Docker volumes.
While I agree that there is notable overhead with our current CoW overlay, this hasn't been a frequent user complaint so far, and I don't see this as a problem in our current operations:
We will decouple the Git objects into a separate read-only volume to avoid duplication.
Please note that not all Janitor projects use Git. Many Mozilla projects use Mercurial, and some projects like Chromium use their custom source syncing tools (e.g. fetch
).
Also, I only see limited value in decoupling Git objects alone. For example, the ./mozilla/mach clobber
overhead you mention is due to build artifacts being coupled to Docker's CoW, not Git objects (the Thunderbird image doesn't even use Git), and the biggest disk space overhead in our current images is not Git objects (42.3MB layer in the Chromium image) but pre-compiled artifacts (26.6GB layer in the Chromium image) multiplied by how many old trees are kept alive by old user containers.
Does your solution de-couple build artifacts? And if so, does it reduce the disk space overhead caused by old pre-compiled source trees from old containers?
We will be using a native Copy-on-Write filesystem to achieve low write overhead. OverlayFS is extremely bad on deleting files
Please note that we use a variety of OSes in our community-backed Docker servers (e.g. Debian, Ubuntu, Amazon Linux, ...) which might not all have good native Copy-on-Write filesystems available, nor is it always possible/easy to create dedicated CoW partitions in these servers.
Additionally, here you analyze OverlayFS, but we also use a variety of Docker storage drivers in our servers (e.g. OverlayFS, AUFS, DeviceMapper, ...) which may not all share the same performance aspects.
We will decouple the runtime environment itself from the build artifacts. This way, we can apply system and toolchain updates without destroying the current working tree.
De-coupling the system from user's working trees is a nice idea, and that's what Cloud9 IDE was doing for their (very small) user workspaces.
However, I'm afraid that it increases operational complexity significantly, while we're just a small team of part-time volunteers. Additionally, while it would solve the problem of upgrading the users' system and toolchains without disrupting their current working trees, it doesn't solve the problem of old working trees taking up all our disk space (we'd be shifting the disk space problem from old Docker images to old volumes, without solving it).
Union filesystem based | CoW filesystem based
I find these approach names ambiguous and confusing. Could you please append a clear "(current approach)" or "(suggested new approach)" hint to the column names?
Refactor the dockerfiles repo so that build scripts are decoupled from Dockerfile
As mentioned on IRC, please keep this involved refactoring in a separate branch for now. I'd like to avoid increasing our Dockerfiles' complexity, especially for experimental changes.
Implement build-after-pull and stop relying on CI for build
To me, this is a step back. We used to build on-premises, taking the large performance hit, and validating images ourselves (e.g. build failures were detected after a pull, not before).
Moving to CI builds greatly simplified our life (we now continuously rebuild images in the background, without taking a performance hit, and we can validate new commits and pull requests before pulling them). Please keep project builds outside of our Docker servers if possible.
At this point, we can finally put all of the things into production
Before this step, we need to take a hard look at what performance wins this approach is yielding (pondered by the value it brings to users, e.g. better noVNC latency is a much bigger win than 10x faster clobber times), and what costs we are paying for them in terms of complexity (more software to maintain, more steps in our deployments and maintenance efforts, generally more moving parts that can lead to more & trickier bugs than simple monolithic Docker containers). If the benefits are not overwhelmingly superior to the costs, then I'd vote against this approach.
Thanks again for championing this very interesting experiment! I'm really looking forward to knowing more about gains vs costs here.
Please note that not all Janitor projects use Git. Many Mozilla projects use Mercurial, and some projects like Chromium use their custom source syncing tools (e.g. fetch).
I admit that Mercurial doesn't support this pattern. In Chromium this is likely viable though, by using some tricks for depot_tools
.
and the biggest disk space overhead in our current images is not Git objects (42.3MB layer in the Chromium image) but pre-compiled artifacts (26.6GB layer in the Chromium image) multiplied by how many old trees are kept alive by old user containers.
The git folder itself also weight about 10GB; deduplicating it is a 20% improvement which is not bad.
Does your solution de-couple build artifacts? And if so, does it reduce the disk space overhead caused by old pre-compiled source trees from old containers?
Build artifacts are not stored in the read-only volume. However, CoW allows dirty rebuild, where we can see improvements in the situation.
To me, this is a step back. We used to build on-premises, taking the large performance hit, and validating images ourselves (e.g. build failures were detected after a pull, not before).
Maybe we can run a CI with some merge gating bot like bors-ng. This comes with maintenance cost, but it will vastly improve the CI feedback time.
Having things cloned beforehand is good, but writing to them isn't. There's notable overhead when writing to a CoW overlay, compared to Docker volumes.
With the following method, we focus on both space savings and I/O speedup:
We will decouple the Git objects into a separate read-only volume to avoid duplication. The reason of doing this is that if an repository is modified repeatedly, Git cannot pack the object in a deterministic way that the filesystem can deduplicate. Instead, we will be using
git clone --reference
so that the .git directory only contains the objects that the user created. (I have verified that this will also deduplicate objects when we fetch upstream.)We will be using a native Copy-on-Write filesystem to achieve low write overhead. OverlayFS is extremely bad on deleting files as it amplifies the write that normally only affects inode/dentry to metadata blocks, which converts to roughly 10x slowness when doing
make clobber
.We will decouple the runtime environment itself from the build artifacts. This way, we can apply system and toolchain updates without destroying the current working tree.
Focusing on the CoW part, we will be using a few approach to allow people (including us) without a native filesystem to run Janitor:
janitor-production
Build from a fresh layer
Tag
janitor-production
Perform dirty build if enabled
Mount both ro and data volume
These features will be specific to CoW backends: dirty rebuild, upgrades without deleting working tree. We will add a flag in the Janitor application so these feature are not shown in the UI when not available.
Next, the plan of migration:
dockerfiles
repo so that build scripts are decoupled from Dockerfile