cloudfoundry / eirini

Pluggable container orchestration for Cloud Foundry, and a Kubernetes backend
Apache License 2.0
115 stars 30 forks source link

Inspect image user #121

Closed Birdrock closed 3 years ago

Birdrock commented 3 years ago

Preliminary PR as per discussion with Giuseppe Capizzi - https://cloudfoundry.slack.com/archives/C8RU3BZ26/p1605550105243100?thread_ts=1605540085.240900&cid=C8RU3BZ26

The functionality here is to extract the user from the image and pass it the CC on docker_staging. On the desire side, it is included with the payload and used for the PodSecurityContext as the RunAsUser.

Due to RunAsNonRoot, the RunAsUser requires a numeric UID. In this implementation, there is a naive check to see if the user is parseable, then assigns a default UID if is not. I'm unsure if this functionality belongs in StatefulSet generation or whether CC should do this conversion. One one hand, the numeric UID requirement is a result of Kubernetes, so it isn't really a CC concern. On the other hand, CC is the hub of operations, so it should maybe understand the requirements of desires to Eirini.

cf-gitbot commented 3 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/175827405

The labels on this github issue will be updated when the story is started.

danail-branekov commented 3 years ago

Hey @Birdrock,

Thanks for the PR! From my point of view, the suggested approach looks good.

As per your doubt - where the resolution of a non-numeric user should take place - I believe that Eirini is the right place to do that. In Diego CF, CC/Diego sends a request to Garden to run a process as user, where the user can be either user id (a number) or user name. If it is a user name, then Garden resolves the name to user ID and uses that user ID to execute the process. This flow is transparent to CC and Diego and I believe that this should remain like that in the K8S world as well.

As you pointed out, defaulting the user ID to 1000 if the value in the request is not a parseable number is not really robust. In the CF Diego world, Garden would spin up a dummy pea container for the requested docker image and look up the user name in /etc/passwd and /etc/group files on the container root file system. I wonder whether we can implement something similar in K8S - maybe spin a dummy pod and peek on its filesystem... what do you think @cloudfoundry-incubator/eirini?

The PR itself has some rough edges like unhandled errors, misleading test names, etc. but as it is a preliminary PR, commenting on those does not really make sense.

Birdrock commented 3 years ago

@danail-branekov Glad I'm on the right basic approach. Thank you for the comments.

Spinning up a dummy container to inspect /etc/passwd and /etc/group sounds interesting. It is a bit out of my depth currently, but I can give a try at it, if y'all can point me in the right direction. Alternatively, I can clean up the branch a bit, and you can implement the dummy container checking.

Agreed on the rough edges of the PR. I wanted to get something in front of you sooner than later. Again, thank you for commenting on this - I'm looking forward to collaborating further on implementing this.

danail-branekov commented 3 years ago

Hey @Birdrock,

Running a dummy container on K8S has certain things to consider, e.g.

We have created a spike story in our backlog to look for feasible implementation, hopefully it would give us some ideas how to proceed.

Birdrock commented 3 years ago

Hey @danail-branekov,

That all makes sense. I was thinking that an init container might be a place to look, but there is still some wiring that would need to be done as a callback to return the mapped user ID.

As far as needing to run as root - that was one of my concerns as well. My understanding was that Kubernetes specifically requires a numeric user ID because it doesn't want to expose itself to this scenario.

Thank you for the story link - I'm excited to see what y'all find. Running arbitrary Docker images in cf-for-k8s has been something of a sore spot recently due to RunAsUser being removed from the SecurityContext. As great as the cf push source to running app is, there seems to be a significant use case for pushing pre-built Docker images, especially in the Kubernetes ecosystem.

Birdrock commented 3 years ago

@gcapizzi @danail-branekov

I see that the spike has been completed. I'll look at the findings, but it appears that the cost/benefit ratio is high on inspecting the image file system. Where does that leave the inspection of the image in staging to be used int the SecurityContext?

I apologize for the radio silence on my side. Between holidays and moving apartments, I've been out of the loop for a bit.

gcapizzi commented 3 years ago

Hey @Birdrock! Here are my notes on the spike story we ran to verify the feasibility of this:

  1. Any implementation we could come up with would be more complex than anything we have in Eirini right now.
  2. Any users who own the image they are pushing can change the image to use a numeric user id.
  3. Users who do not own the image will still have issues with the large majority of Docker images that do not specify any USER and thus fallback to 0 (root), which will be denied by runAsNonRoot.
  4. Users who do not own the image can still create a new image based on the main one but overriding the USER to be numeric.
  5. runAsNonRoot is a workaround for the lack of user namespaces in Kubernetes. Diego/Garden have user namespaces, which means they can safely run images with user 0/root (as container root is mapped to an unprivileged user on the host) instead of having to reject them. runAsNonRoot already prevents us from reaching feature parity with Diego/Garden for most Docker images (which run as root/0). Supporting images with non-numeric users would only solve the problem for (we believe) a small number of use cases, at a very high cost.

I'd stress 5 in particular: we'll never be able to behave like Diego/Garden because they have user namespaces and we don't :(

Birdrock commented 3 years ago

@gcapizzi Thanks for the update.

I have a couple more thoughts on avenues to investigate.

  1. Provide runAsUser if the desire specifies a UID. There may be cases where we want to specify a specific user. Any configurable logic or defaulting could occur in the docker staging side to keep the StatefulSet logic clean.
  2. We don't necessarily need to spin up a container to map user directives to /etc/passwd & /etc/groups. It may be possible to pull the image tgz, explode it, the explode the layer tgz archives specified in the manifest. This may avoid security concerns. Additionally, if this logic is located in the docker staging side, it would be largely decoupled from the rest of the Eirini code.

I'll do some exploration and let you know if something looks promising.