envoyproxy / ci-infra

Apache License 2.0
7 stars 10 forks source link

Add a local (~s3) Docker registry in hosted images (for build caches) #16

Open phlax opened 1 year ago

phlax commented 1 year ago

Currently we have a local bazel-remote server in the hosted images that proxies to an s3 bucket that allows storing action caches and gives (non-RBE) CI a massive optimization

We could do something similar with docker/buildx which allows you to cache_to/cache_from build caches to/from a registry which we could run with a bucket storage backend

This would speed up/reduce cost a lot for docker-centric CI jobs and would avoid much more complicated and less reliable azp caching approaches.

phlax commented 1 year ago

i have started testing this locally - setting up the docker registry with an s3 bucket is not too hard and works nicely

next step is to test in Envoy CI

Im mostly thinking about doing this in the context of build caches, but im also wondering whether this can be faster than using the azp cache for the build image - which is free to use, so i guess the question is on s3 bucket cost in that case

phlax commented 1 year ago

fwiw my initial testing shows dockerhub as >2x as fast compared to docker-registry + wasabi:s3

the caveat is that i set the s3 bucket to us-east so its probably a bit faster using a closer bucket - ill test further

phlax commented 1 year ago

testing in envoy ci and the speed is comparable to avg dockerhub and slower than cache

i need a test a bit more, and not sure how this would measure up using s3 rather than wasabi, but i think this suggests that this might be good for build caches but not for just caching registry images

phlax commented 1 year ago

https://dev.azure.com/cncf/envoy/_build/results?buildId=130837&view=logs&j=55bc2ab9-36c1-5ab8-58a6-8c258728eeff&t=79cc2dee-5ade-5c99-9b4b-6d3bb37ea34a&l=57

phlax commented 1 year ago

this turns out to be a load more complicated than you might expect

firstly each image that is built has to have its cache_to/cache_from separately set which adds a load of code/complexity

the next issue is that it expects an https host with a valid cert, despite every example there is using http://localhost:5000, working around that is messy/hacky https://github.com/docker/buildx/issues/94