buildpacks / lifecycle

Reference implementation of the Cloud Native Buildpacks lifecycle
https://buildpacks.io
Apache License 2.0
187 stars 106 forks source link

Security review: recover corrupt caches #1382

Open natalieparellano opened 6 months ago

natalieparellano commented 6 months ago

Summary

In the security review, this is LOW-2: Denial-of-Service (DoS) provoked by removing build cache tarballs or altering the OCI image manifest. The action plan asks us to ensure that

If a tarball is missing, a solution should be found by either rebuilding the corresponding tarball or wiping out the cache in order to continue the containerization process without errors, or, a second execution should be possible without errors

Further context from the initial report:

It appears that if a tarball referenced in the io.buildpacks.lifecycle.cache.metadata file is absent on the container filesystem (mounted host volume), the application containerization process quit without wiping out the cache content.


Proposal

Alternatively, we considered updating the cache metadata to exclude the layer/tarball that is missing. But, we are not sure if this scenario is common enough to warrant such a surgical approach.


Related

RFC #___


Context

joeybrown-sf commented 5 months ago

Here's a demo of the broken behavior.

demo-failure

joeybrown-sf commented 5 months ago

And here's a demo of the expected behavior.

expected-demo

joeybrown-sf commented 5 months ago

I've got things working on my branch corrupted-cache. But I still need to add at least one test for the Exporter change.

@natalieparellano I went a different direction that we originally discussed. Instead of wiping out the cache, I am instead just ignoring the missing files. What do you think? Happy to do whatever.

joeybrown-sf commented 5 months ago

cc @jabrown85

natalieparellano commented 4 months ago

Instead of wiping out the cache, I am instead just ignoring the missing files.

Sounds good to me :) should we land this one?

jabrown85 commented 4 months ago

I think @joeybrown-sf was hoping to fix up a test or something? What remains here @joeybrown-sf ?

natalieparellano commented 3 months ago

Circling back, see discussion on https://github.com/buildpacks/lifecycle-private/pull/16, our fix is working for volume caches, but for image caches we are NOT getting "layer not found" errors where we expect them (and hence are failing and bubbling up the error instead of skipping over the layer). This requires further investigation in imgutil. We added FIXMEs in the code so that it is apparent that the image cache flow requires further work. We could leave this issue open or create another issue that is specific to image caches.