Open cormacpayne opened 1 year ago
Thanks for filing this issue @cormacpayne - I think you're right that the basic tension is around things that are and aren't allowed to have root access. As I understand it, the point of the extender is to allow certain power user moves that require full privileges, whereas usually the buildpacks build process is unprivileged by design. So I think you're right when you diagnose the issue you're hitting here:
this is due to the extender not being called from within an individual container with root access
and actually I think not having root access would be an issue even if you made a custom build of pack etc. that was willing to call the extender for you.
Dockerfile Extensions are still considered experimental, so we're definitely still learning how folks need them to work and how to best help with those use-cases. For what you're trying to do, would it have worked without root access?
All that said, I'm not the best authority on these features :) I think the folks who know the most are on vacation this week so you may get a different set of answers in 7-10 days :)
@joe-kimmel-vmw Hey Joe, thanks for the detailed response! For our specific case, our build stack is a "locked-down" (i.e., network restricted) image that pulls down binaries via a built-in command-line interface. Each buildpack we bundled with our builder has a corresponding extension that calls this CLI to install the respective set of platform binaries. Unfortunately, this CLI can only be ran as root
, which is why we've gravitated towards this extension model to allow these additional installation steps to be ran successfully.
I'll keep an eye out for any additional comments from folks just so we can survey our options in this space -- our flow is doable without extensions, but things become much more convenient and easier for us with the feature, so we'd love to continue to leverage it! Thanks again!
@cormacpayne re Operation not permitted
- are you invoking the extender as root?
Regarding running the extender
from the creator
- this could probably be done in limited circumstances, but requires some care. Some thoughts are below -
The easiest use case to support would be image extensions that simply switch the runtime base image to a new base. This actually doesn't require running the extender at all (it is done in the detector) but would involve modifying some validations that error out if extensions are provided and the creator is invoked.
The next use case would be image extensions that extend the build-time base image. This is completely doable but we would likely need to retain root privileges through analyze-detect-restore-extend(build), which would involve removing validation in the detector that refuses to run as root, which could be somewhat dangerous. We could add some code to ensure that buildpack (./bin/detect) processes themselves are run as the CNB user even when the lifecycle is root. But it would still mean that some lifecycle activities would happen with elevated permissions compared to what exists today. Another downside would be that it would be impossible to isolate registry credentials from Dockerfile instructions, because those creds would exist in the same container where extension is running (as root). Today we protect creds by running the restorer (which downloads stuff) in a separate container from the extender. This could potentially be mitigated by the platform pre-downloading the needed stuff.
The hardest use case to support would be run image extension. Simply because, the lifecycle doesn't run in the context of the run image for all the other phases. At one point, we tossed out the "crazy idea" of saving off the build image filesystem after the build phase, downloading & extracting the run image filesystem, running run image extension, saving off the extended run image filesystem, extracting the build image filesystem again, and continuing on to the export phase. It still sounds crazy TBH but if there's a real need for it we could explore it.
@natalieparellano Thanks for the detailed response as always, Natalie! Please find responses below:
re
Operation not permitted
- are you invoking the extender as root?
For the information outlined in the original description of this issue, I was invoking the extender as a non-root user -- I did follow this up by creating a builder whose build stack was executed as root and attempted to run the same script (with each phase ran as their corresponding lifecycle command), and that failed during the detector phase when calling the Privileges()
function. This seemingly was the only phase (that we're calling, aka everything but builder) that had this type of check, so I'm not sure if any other phase would've run into issues when executing as root, but you have much more context about the codebase, so I'll defer to you here.
The easiest use case to support would be image extensions that simply switch the runtime base image to a new base.
I'm glad you brought this up because this was something I was curious about when digging into these scenarios a bit; specifically, if we didn't have a build extensions, would this flow work with just run extensions since it shouldn't be performing any steps that required elevated permissions (AFAIK). I didn't have a chance to try it out, but this is seemingly something that would've worked from calling the individual lifecycle commands, but not the creator (as you already mentioned with the validation required).
The next use case would be image extensions that extend the build-time base image.
This is a lot of good information to know about what's going on security-wise behind-the-scenes; if there was a way to run only the detector as the CNB user as you mentioned, I think the rest of the phases should work for us (as mentioned above). We're also exploring some options on our side that would allow us to spin up another container within the container that's hosting the builder image, and these sub-containers would be responsible for running each phase with different permission levels, similar to how you all are calling phases via the creator. I'll keep you posted on any updates from these findings since this may not require any changes to the creator but rather us just having to call each phase individually.
The hardest use case to support would be run image extension.
I don't think we need any additional changes from the run image extension -- correct me if I'm wrong, but I think we should be OK with just the basic base image swapping as denoted above from the second block. I'm not sure if the run image extension scenario you're outlining here is in reference to something more than the base image swapping.
This seemingly was the only phase (that we're calling, aka everything but builder) that had this type of check,
You are correct! The builder does have this check, but not the extender (the extender will drop privileges before executing anything related to buildpack ./bin/build execution).
these sub-containers would be responsible for running each phase with different permission levels
That seems to be the best option security-wise. Using the creator requires making some assumptions of trust, and using the extender even more so.
we should be OK with just the basic base image swapping as denoted above from the second block
That's great news! The "simply switching the runtime base image" case would be fairly easy to implement. If there's broad support for it, we could even target it for the next release (0.18.0).
A small follow-up to this - I recently encountered the dreaded "operation not permitted" error when trying to add extensions support to kpack, even though I was running the extender as root. It turns out I needed to add SETGID
and SETUID
capabilities to the build pod (see here) due to the USER root
instruction in my Dockerfile. Platforms with different Dockerfiles may have different requirements (and we're still digging to determine if these capabilities are truly necessary) but I thought I'd share this info in case it helps with your investigation.
@natalieparellano
We are looking into this and I have some questions, you might be able to answer...
1.)
The hardest use case to support would be run image extension. Simply because, the lifecycle doesn't run in the context of the run image for all the other phases.
We were surprised to read in the spec that the extent (run)
must be called with the run environment. Why exactly? All the work of the extension is already done and "only" the kaniko
call is missing, so the environment should not matter. Or what is the background for this requirement?
Doesn't the creator
have to call extend
from PLATFORM_API=0.13
forward (once extensions are not experimental anymore)? Otherwise the creator
would not be spec-compliant. Right?
"only" the kaniko call is missing, so the environment should not matter
We use a little-known kaniko option to avoid unpacking the initial filesystem (the FROM <image>
). This only works because we are running in the context of the image being extended, and multi-stage Dockerfiles are not permitted.
The reason is to avoid exposing registry credentials to Dockerfiles generated from image extensions (see for example https://github.com/GoogleContainerTools/kaniko/issues/2153).
Doesn't the creator have to call extend from PLATFORM_API=0.13 forward (once extensions are not experimental anymore)? Otherwise the creator would not be spec-compliant. Right?
I think the extender
is still considered optional. But, we should make it clear in the spec that the creator does not call the extender.
Edit: the spec currently says:
Running creator SHALL be equivalent to running detector, analyzer, restorer, builder and exporter in order with identical inputs where they are accepted, with the following exceptions.
Which does not mention the extender. Perhaps here is where we should explicitly point out that the extender is omitted.
@natalieparellanoit looks a bit like some platforms would need a re-iteration over what you stated in a comment above...
At one point, we tossed out the "crazy idea" of saving off the build image filesystem after the build phase, downloading & extracting the run image filesystem, running run image extension, saving off the extended run image filesystem, extracting the build image filesystem again, and continuing on to the export phase. It still sounds crazy TBH but if there's a real need for it we could explore it.
My weakest spot in understanding the full scope of constraints for run image extensions is probably which run image needs to be in the context of the run image extender? Or, more precisely, how dynamic this information is by nature. If it is potentially decided by one of the run image extensions - then platforms need to be able to dynamically schedule containers, right? For Kubernetes based platforms, we would be then talking about creating individual Pods for the different phases, correct?
For Kubernetes based platforms, we would be then talking about creating individual Pods for the different phases, correct?
That is probably going to need to happen. https://github.com/buildpacks-community/kpack/pull/1332 is our (somewhat outdated) attempt to get build image extension + run image switching to work in kpack.
Description
The
creator
doesn't appear to have support for calling theextender
; digging around a little, it appears this behavior is intentional and prevented in a couple of ways:creator
's directory store with an empty extender directorypack
if thecreator
should be used when running the lifecycle with extensionsIn our particular case, we're looking to call the
creator
from within a container based on our builder (which is bundled with extensions) to build and push an image to a registry.Proposed solution
Optimistically, the
creator
would include a flag (or flow) to check for extensions within the builder and would swap the call to thebuilder
for a call to theextender
. In the alternatives section below, I outlined our behavior for callingextender
directly but continuing to call thecreator
would be optimal for us.I'm not entirely sure if giving the
extender
from within thecreator
is feasible from a permissions point of view, but would love to hear additional thoughts on the subject and ways that this could be solved, if it was an issue that was looking to be solved in the future when extensions were introduced.Describe alternatives you've considered
Since the
extender
cannot be called from thecreator
(as outlined above), we've tried to call the individual phase commands of thecreator
command, which involves calling theextender
rather than thebuilder
to carry out our desired flow. When calling theextender
directly, it's able to get to the point where the build extension is used, but fails due to an "Operation not permitted" error when attempting to run the first line of our extension'sDockerfile
snippet, as seen below:My initial hunch is that this is due to the
extender
not being called from within an individual container with root access, as is done bypack
when it calls theextender
duringbuild
, but would love any additional insight. In our case, we're currently not able to spin up a container from within this container hosting the builder image where we're calling the individualcreator
commands, so this isn't seemingly a route that we'll be able to pursue further.Additional context
I'm happy to share any additional information or context that may be needed!