worker: Kubernetes Runtime Container SecurityContext

cognifloyd commented 2 years ago

Description

Add SecurityContext to containers in the kubernetes runtime as noted in this TODO:

https://github.com/go-vela/worker/blob/19081a02a335086f774e8a73f595e9efb873659d/runtime/kubernetes/container.go#L173

    // TODO: add SecurityContext options (runAsUser, runAsNonRoot, sysctls)

There are several settings that can be configured in SecurityContext, some of which can only be set on the whole Pod, others can only be set per-container, and others can be set in either context (container-level overriding pod-level).

Setting	Type	Pod	Container	Configure `Worker` w/ default	`Step` override (if worker allows)	`Pipeline` override (if worker allows)	Apply if volumes need it
allowPrivilegeEscalation	boolean		:white_check_mark:	:heavy_minus_sign:	:heavy_minus_sign: implicitly toggled by `privileged`	:heavy_minus_sign:
capabilities	object		:white_check_mark:	:heavy_plus_sign: :heavy_check_mark: via CRD	:heavy_plus_sign:	:heavy_minus_sign:
- add, drop	string arrays
fsGroup	integer	:white_check_mark:		:heavy_minus_sign:	:heavy_minus_sign:	:heavy_minus_sign:	:floppy_disk:
fsGroupChangePolicy	string	:white_check_mark:		:heavy_minus_sign:	:heavy_minus_sign:	:heavy_minus_sign:	:floppy_disk:
privileged	boolean		:white_check_mark:	:heavy_plus_sign: :heavy_check_mark: allow list of images via opt/env	:heavy_plus_sign: config exists	:heavy_minus_sign:
procMount	string		:white_check_mark:	:heavy_minus_sign:	:heavy_minus_sign:	:heavy_minus_sign:
readOnlyRootFilesystem	boolean		:white_check_mark:	:heavy_minus_sign:	:heavy_minus_sign:	:heavy_minus_sign:	:floppy_disk:
runAsGroup	integer	:white_check_mark:	:white_check_mark:	:heavy_minus_sign:	:heavy_plus_sign:	:grey_question:
runAsNonRoot	boolean	:white_check_mark:	:white_check_mark:	:heavy_plus_sign: :heavy_check_mark: pod-level only via CRD	:heavy_plus_sign:	:grey_question:
runAsUser	integer	:white_check_mark:	:white_check_mark:	:heavy_minus_sign:	:heavy_plus_sign: config exists (missing in k8s)	:heavy_minus_sign:
seLinuxOptions	object	:white_check_mark:	:white_check_mark:	:grey_question:	:grey_question:	:grey_question:
- level, role, type, user	strings
seccompProfile	object	:white_check_mark:	:white_check_mark:	:grey_question:	:grey_question:	:grey_question:
- localhostProfile, type	strings
supplementalGroups	integer array	:white_check_mark:		:heavy_minus_sign:	:heavy_minus_sign:	:grey_question:	:floppy_disk:
sysctls	object array	:white_check_mark:		:heavy_plus_sign: :heavy_check_mark: via CRD	:heavy_minus_sign: `ulimits` exists (but k8s can't do per-step sysctls)	:heavy_plus_sign:
- name, value	strings
windowsOptions	object	:white_check_mark:	:white_check_mark:	N/A	N/A	N/A
- gmsa*, hostProcess, runAsUserName	mixed

Value

Allow using Vela in clusters where an admissions controller blocks the creation of pods unless SecurityContext requirements are met. Make pipelines follow the principle of least-privileges: Like "privileged" only increase access if requested (and permitted).

Definition of Done

Each k8s worker can add global SecurityContext to all pipeline containers.
Different workers can have different settings (configurable somehow by the vela admin).
Maybe surface minimal override configuration for this in the pipeline yaml or in the repo settings.

Effort (Optional)

Adding the SecurityContext should be straight-forward. But, how to configure that is not clear.

Impacted Personas (Optional)

Anyone who uses the kubernetes runtime and wants to apply SecurityContext settings.

cognifloyd commented 2 years ago

So, how do we add more complex config to the worker like this? Shove JSON into env vars?

JordanSussman commented 2 years ago

So, how do we add more complex config to the worker like this? Shove JSON into env vars?

Do you envision that you would want to configure this at the global (server) level and/or at the individual pipeline (.vela.yml) level? I suppose we could add more configuration options underneath the worker key if you want it to be somewhat configurable at the pipeline level.

cognifloyd commented 2 years ago

Do you envision that you would want to configure this at the global (server) level and/or at the individual pipeline (.vela.yml) level?

Hmm. I imagine a hybrid model.

Each worker has a default SecurityContext. Pipelines choose the worker (and consequently the base SecurityContext) using the worker block you linked to.
The worker selection could involve a new security (?) tag, or just the admin can reuse flavor.
Each step in the pipeline has the option to opt-in to additional security features if the worker is configured to allow that. Similar to privileged.

That is pretty straight forward (I hope), until you consider pod SecurityContext settings that can't be configured per step. If any of that needs to be configured per pipeline, then we would need some new top-level security tag.

I'll think a bit more and then I'll extend my chart to show where things could be configurable (pipeline, pipeline step, worker default, worker allows changing).

cognifloyd commented 2 years ago

Worker Runtime Config

OK. So the worker doesn't need as many SecurityContext defaults as I thought.

Currently, the worker has one option:

runtime.privileged-images configuration to enable pipelines steps/services to use the privileged tag

Here, I'm proposing we add some kind of config to configure per-worker defaults for (the type, in italics, could be easily defined in new ENV vars or CLI args):

:heavy_check_mark: capabilities.add string list
:heavy_check_mark: capabilities.drop string list
:heavy_check_mark: runAsNonRoot boolean
:heavy_check_mark: sysctls list of key:value pairs (?)

edit: added checkmark to show that these were made configurable. Instead of env/opt, they are configurable via a CRD

I don't have a clear idea of when/how people would want to use seLinuxOptions or seccompProfile, so I'm ignoring those for now.

cognifloyd commented 2 years ago

Add tags to `Step`/`Service`

The Step or Service already define these SecurityContext-related tags:

privileged (which k8s also uses to manage allowPrivilegeEscalation, so we can ignore that)
user which would define runAsUser but has not been implemented in the kubernetes runtime yet. This has probably not been implemented because user is a string, but runAsUser is an integer, so we'd need a way to look up the user id to use based on the user tag.
ulimits which could be implemented using sysctls in SecurityContext, but that would apply to all containers/steps/services in the pipeline, not just the step the ulimits are defined on. So, the k8s runtime cannot support the ulimits tag.

Here are the things that I'd like to see configurable (eventually) per pipeline step/service:

capabilites map would be merged with the worker's defaults iff the worker allows them to be modified.
runAsNonRoot boolean overrides worker's default iff the worker allows that.
runAsGroup integer the group id (not sure how much value this would have in pipelines)

I don't have a clear idea of when/how people would want to use seLinuxOptions or seccompProfile, so I'm ignoring those for now.

cognifloyd commented 2 years ago

Add new pipeline-wide config block

With the docker runner, the lifecycle and settings of each step/service are basically isolated, so there is no requirement to configure something for all steps and services.

With the kubernetes runner, however, the pipeline maps to the pod, and some setting can only be set at the pod level. So we might need to expose some top-level pipeline config to configure those pod-level settings.

The biggest candidate for this is:

sysctls which is mentioned in the code comment TODO, but cannot be defined per container/step/service.

Once there is such a pipeline-wide config mechanism, these are also candidates for that:

runAsGroup
runAsNonRoot
supplementalGroups

Plus, such a pipeline-wide config block could be used to simplify repetitive definitions of other settings like user, pull, and capabilities across all steps/services.

I don't have a clear idea of when/how people would want to use seLinuxOptions or seccompProfile, so I'm ignoring those for now.

cognifloyd commented 2 years ago

SecurityContext and Volumes

Currently, only host volumes are supported. In the future, we might expose additional volume types. When that happens, several SecurityContext settings might need to be added implicitly to support that volumes feature. Any new config in the pipeline would probably be tied to those volumes.

fsGroup
fsGroupPolicy
readOnlyRootFilesystem
supplementalGroups

Also, the procMount setting seems very esoteric (not useful) to me, so I skipped that, but I suppose it could also be part of the volumes config if someone needed it.

cognifloyd commented 2 years ago

OK. I've added a chart in the issue description. Then I added a comment to summarize / describe each of the 4 right most columns.

I need capabilities. I want to configure most of my workers with capabilities.drop=ALL and then add the required capabilities to each step/service.

cognifloyd commented 2 years ago

I suppose we could add more configuration options underneath the worker key if you want it to be somewhat configurable at the pipeline level.

@JordanSussman Oh. You weren't suggesting adding additional routing. You were suggesting that pipeline-level config could be defined under the worker key.

So, something like this (example sysctls from k8s docs)?

worker:
  flavor: foobar
  platform: k8s
  runAsGroup: 1234
  runAsNonRoot: true
  supplementalGroups:
    - 5678
    - 9012
  sysctls:
    kernel.shm_rmid_forced: "0"
    net.core.somaxconn: "1024"
    kernel.msgmax: "65536"

kneal commented 2 years ago

I think I like keeping the controls on the admin side. I do think it could expand into maybe some new routing keys in the worker: block but I'm not sure exposing that at the user level would work super well with the current setup.

cognifloyd commented 2 years ago

Worker Runtime Config

Instead of relying on an ever increasing list of env vars / options, I added a PipelinePodsTemplate CRD that allows admins to specify defaults to use in the Pods created by that worker.

Now that go-vela/worker#294 is merged, the admin can configure these bits:

container.SecurityContext.capabilities (both add and drop)
pod SecurityContext.RunAsNonRoot (note that this is a validation flag - it forces the pod to fail if all container images are not configured to run as some user other than root. Also note that only the pod-level flag is available. We can add container-level once we have a way to configure/override from the vela yaml pipeline)
pod SecurityContext.Sysctls (here be dragons!)
- sysctls requires more discussion and safety checking:
- it might need some new tag(s) in the pipeline config and
- this will need some kind of sanity-checking validation, perhaps only allowing a limited set of them. The kubernetes docs warn that only a subset of sysctls are namespaced sufficiently to use safely per-container, but otherwise should be handled at the node-level.

I've added check marks in the chart to mark which things are implemented.

kneal commented 2 years ago

Instead of relying on an ever-increasing list of env vars

It's probably worth noting that one of the reasons we use urfave/cli so heavily for injecting config is setting the configuration with it has a lot of options. You don't necessarily have to use env. You could have a like agent.yml or server.yml within the deployed service container to set the config.

It's not really documented on our side anywhere but a feature of the library. It's how/why the Vela CLI has a config file option

cognifloyd commented 2 years ago

You could have a like agent.yml or server.yml within the deployed service container to set the config.

I thought that that required a separate file for each config option, doesn't it? Is there a way to put all of those config options in a single file with urfave/cli?

cognifloyd commented 2 years ago

Another thought: When we expand the pipeline YAML to make more of these options configurable, the admin will need a way to configure the worker to say which of those options are allowed on a per-worker basis. We can easily expand the CRD to cover "allowed pipeline overrides".

kneal commented 2 years ago

We might need to add a new flag for a single entrypoint but you can do a single file. Here's the CLI one: https://github.com/go-vela/cli/blob/master/cmd/vela-cli/main.go#L72-L80

All of the ENV configs in the CLI can be used within that file with the name parameter in the flag. So, we could likely just add a new flag like the first one I linked and create a much simpler admin experience.

cognifloyd commented 2 years ago

We might need to add a new flag for a single entrypoint but you can do a single file. Here's the CLI one: https://github.com/go-vela/cli/blob/master/cmd/vela-cli/main.go#L72-L80

All of the ENV configs in the CLI can be used within that file with the name parameter in the flag. So, we could likely just add a new flag like the first one I linked and create a much simpler admin experience.

Oh. Cool! We'll probably want to clean up the Name: of all the options before we expose that so that they're a bit more consistent (. vs _ vs - vs camelCase).

kneal commented 2 years ago

Yeah, that would be a big thing because the hyphen or underscore will keep the key on the same level as the YAML file. The dot syntax turns that section into an object with keys underneath. Which also can be seen as an example in the CLI.

go-vela / community