Open afharo opened 1 year ago
Pinging @elastic/kibana-operations (Team:Operations)
@afharo Thanks for opening the issue. I've just been digging into this with help from @mistic and @jbudz, and as far as I understand it, once the Kibana Docker image is build for a particular CI run we never invalidate or re-build it.
The only thing that we can see that can be optimised currently, is adding a shared download cache between the CI workers to cache Docker images that needs to be downloaded (mainly the Ubuntu image).
I'm not ruling out that I might have misunderstood the problem, or maybe that you are seeing some issues that we're not aware of. If so, please clarify
@watson did you mean to ping @jbudz ?
@jonathan-buttner ups, you're right. Thanks for clearing that up
is adding a shared download cache between the CI workers to cache Docker images
Yeah! That'd help. AFAIK, it doesn't only apply to the FROM
step, but shared cache also applies to any following steps (as long as the inputs don't change): like https://github.com/elastic/kibana/blob/f28b694e5aafb2322b984f79a0455e82b2b73ad6/src/dev/build/tasks/os_packages/docker_generator/templates/base/Dockerfile#L14-L19
The thing is: IIUC, sharing the cache also improves the experience in docker pull
: because 2 images share the same N first steps in the Docker file, all those layers are reused from a previously pulled image and upgrading is faster as the client skips downloading and extracting those layers.
Refer to this guide on the best practices of dockerizing a Node.js app: https://nodejs.org/en/docs/guides/nodejs-docker-webapp. Bear in mind they first copy the package*.json
files and npm install
them. Then, they copy the app's source (typically the part of the app that changes more often). This makes that subsequent docker pull
s only need to download the diff.
Of course, the largest layer in our Dockerfile is still related to handling the .tar.gz
file, but we can optimize that later in the process. And we could even discuss moving steps around if we think they can be reused.
WDYT?
Thanks for the details ☺️
Refer to this guide on the best practices of dockerizing a Node.js app: https://nodejs.org/en/docs/guides/nodejs-docker-webapp. Bear in mind they first copy the
package*.json
files andnpm install
them. Then, they copy the app's source (typically the part of the app that changes more often). This makes that subsequentdocker pull
s only need to download the diff.
In our case, the Kibana Node.js app isn't built as a step in the Dockerfile, but instead in a separate process before Docker is even invoked. It's built and stored in a tar ball which is then copied into the container during a step in the Dockerfile:
Since almost all CI runs will run with a slightly different version of the Kibana source code, this step will almost always invalidate everything below it. So maybe there's something here that can be improved by rearranging the lines. And let's definitely look into caching the Docker and OS package-manger pulls 👍
@watson A few ideas I have for this, not entirely sure on the viability or best approach.
COPY
line into an ENTRYPOINT
script and extract the tarball into a mounted volume.Use a single OS for the builder
step and caching the artifact download and extraction. This allows reuse for each OS instead of doing this work per OS.
Extend the base OS images with the package manager setup we require and host in our repo. Possibly updating them on a set interval?
Differ extraction of the Kibana tarball until the image is ran for the first time.
Have you measured how much extra boot-time this would add?
Use a single OS for the builder step and caching the artifact download and extraction. This allows reuse for each OS instead of doing this work per OS.
Really good idea! 💯
Extend the base OS images with the package manager setup we require and host in our repo. Possibly updating them on a set interval?
Have we measured how long it takes to build those layers? I'm worried about adding the maintenance overhead, so knowing how much time can be shaved off would be interesting
Describe the feature: AFAIK. Currently, our CI workers don't share any docker caching layers. This results in every CI-built Docker image running all the steps in our Dockerfile.
Apart from being a lengthier build process, it results in totally different Docker images, requiring longer pull times and larger image sizes.
Describe a specific use case for the feature: Any Docker-based deployment takes longer to upgrade (I've seen logs from a k8s cluster claiming that downloading a Kibana image takes around 30s).