chainguard-dev / apko

Build OCI images from APK packages directly without Dockerfile
https://apko.dev
Apache License 2.0
1.2k stars 122 forks source link

We should cache ExpandAPK, not just the apk #772

Closed jonjohnsonjr closed 1 year ago

jonjohnsonjr commented 1 year ago

When we fetch an APK, we fetch, expand, and install.

See gcc:

image

When we have a cache hit, we skip the fetch, but we still do the expand:

image

That 1.2s does not change across image builds and is fairly expensive, so I propose we cache the outputs of ExpandAPK (each section as one .tar.gz file) instead of caching the entire APK as a single file.

This would cut out approximately half the work apko has to do when we have cache hits.

jonjohnsonjr commented 1 year ago

Working towards this, we see a another large reduction in CPU usage:

apko publish --keyring-append  --repository-append  --arch amd64    10.81s user 3.05s system 181% cpu 7.639 total

Previous run was:

apko publish --keyring-append  --repository-append  --arch amd64    13.46s user 4.56s system 227% cpu 7.917 total

(Note that these are all with warm caches, just re-running the same thing.)

Overall latency is roughly the same because we already mitigated the ExpandAPK latency with https://github.com/chainguard-dev/go-apk/pull/75, but we are doing 20% less work than before, which should translate to more throughput.

What's nice is that this work caching will apply to all builds, so when we rebuild the world, we will only pay the ExpandAPK price once per APK instead of every time a build depends on an APK.

I intend to remove writing the whole APK to the cache (in the cache transport), but I'll keep it on the read path so that I don't bust everyone's cache when we merge this. Dropping it from the write path is important because otherwise we are keeping two copies of the data needlessly.

Cache hits (cachedPackage succeeding) are now ~instantaneous instead of also doing ExpandAPK:

image