airflow-helm / charts

The User-Community Airflow Helm Chart is the standard way to deploy Apache Airflow on Kubernetes with Helm. Originally created in 2017, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes.
https://github.com/airflow-helm/charts/tree/main/charts/airflow
Apache License 2.0
639 stars 475 forks source link

add `airflow.defaultContainerSecurityContext` value #578

Closed heba920908 closed 4 months ago

heba920908 commented 2 years ago

Checks

Motivation

After enabling gatekeeper admission controller for pod security policies Should not allow container privilege escalation (deny), gatekeeper will not allow creation of new pods which does not have specified spec.containers[*].securityContext.allowPrivilegeEscalation=false

Azure policy Gatekeeper Library

From k8s security context docs and motivation for policy:

_allowPrivilegeEscalation: Controls whether a process can gain more privileges than its parent process. This bool directly controls whether the no_new_privs flag gets set on the container process. allowPrivilegeEscalation is always true_

Policy rego current implementation part:

## allowPrivilegeEscalation=true                                                                                                  │
│ ## runAsUser != 0 + missing allowPrivilegeEscalation=false 

input_allow_privilege_escalation(c) {                                                                                             │
│     not c.securityContext.allowPrivilegeEscalation == false                                                                       │
│     not c.securityContext.runAsUser                                                                                               │
│     input.review.object.spec.securityContext.runAsUser != 0                                                                       │
│ }   

Being that stated, if we try to install as it is the community airflow helm release with gatekeeper admission controller and psp enabled, following errors will appear (for all airflow pods):

0s          Warning   FailedCreate                 replicaset/airflow2-scheduler-66ffcdc9df                 Error creating: admission webhook "validation.gatekeeper.sh" denied the request: [azurepolicy-psp-container-no-privilege-esc-e6a74aee95507167737f] Privilege escalation container is not allowed: airflow-scheduler...

Current implementation does not have the optional containerSecurityContext spec, and it is being hardcoded in helper pods.tpl

Unfortunately the containerSecurityContext cannot be modified in the community helm chart, we even tried with the Values.DefaultSecurityContext, nevertheless, does not have good outcome that way, If we plan to add the allowPrivilegeEscalation: false in podSecurityContext, will see errors while deploying helm chart.

# * <nil>: Invalid value: "The edited file failed validation": ValidationError(Deployment.spec.template.spec.securityContext): unknown field "allowPrivilegeEscalation" in io.k8s.api.core.v1.PodSecurityContext

Would be nice to have some optional value for containerSecurityContext.

Implementation

To enable containerSecurityContext.allowPrivilegeEscalation option in the values.

Modify pods.tlp:

{{- define "airflow.image" }}
image: {{ .Values.airflow.image.repository }}:{{ .Values.airflow.image.tag }}
imagePullPolicy: {{ .Values.airflow.image.pullPolicy }}
securityContext:
  runAsUser: {{ .Values.airflow.image.uid }}
  runAsGroup: {{ .Values.airflow.image.gid }}
{{- range $key, $value := .Values.airflow.containerSecurityContext }}
  {{- if and (not (eq $key "runAsUser")) (not (eq $key "runAsGroup")) }}
  {{ $key }}: {{ $value }} 
  {{- end }}
{{- end }}
{{- end }}

Current Code: airflow-8.5.2 - pods.tpl

In the values.yaml, we need to add the following to comply with policy:

airflow:
  airflow:
    containerSecurityContext:
      allowPrivilegeEscalation: false
      foo: bar
      # runAsUser and runAsGroup will not be noted since will be grabbed from .Values.airflow.image.uid and .Values.airflow.image.gid

runAsUser and runAsGroup will be ignored since will be configured from .Values.airflow.image.uid and .Values.airflow.image.gid

Are you willing & able to help?

thesuperzapper commented 2 years ago

@heba920908 thanks for raising this issue!

I think we resolve this by creating a value called airflow.defaultContainerSecurityContext, for example:

airflow:

  ## default securityContext for Containers in airflow Pods
  ## - `runAsUser` is ignored, please set with per-image `*.image.uid`
  ## - `runAsGroup` is ignored, please set with per-image `*.image.gid`
  ## - spec for SecurityContext:
  ##   https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#securitycontext-v1-core
  ##
  defaultContainerSecurityContext:
    allowPrivilegeEscalation: false

Because not all images run with the same UID/GID, we can just ignore runAsUser and runAsGroup if they are passed using the omit helm template function, for example:

securityContext:
  runAsUser: {{ .Values.airflow.image.uid }}
  runAsGroup: {{ .Values.airflow.image.gid }}
  {{- omit .Values.airflow.defaultContainerSecurityContext "runAsUser" "runAsGroup" | toYaml | nindent 2 }}

We will need to apply airflow.defaultContainerSecurityContext in more than just the airflow image, here is the full list:

thesuperzapper commented 2 years ago

@heba920908 do you want to contribute this PR?

shalberd commented 1 year ago

@thesuperzapper @zartstrom This is Sven, working with Red Hat Open Data Hub and seeking Airflow as an alternative to Tekton Pipelines on Openshift. Yes, airflow-scheduler pod and especially the git-sync container having a fixed gid of 65533 or so is extremely bad practice. We simply were not able to override it. We were not able to get the latest Helm charts running on openshift without doing this bad practice workaround: oc adm policy add-scc-to-user privileged -z airflow-scheduler -n airflow Is there a particular reason why this has never been fixed? We set the airflow-scheduler securitycontext run as user to the UID allowed in the openshift namespace scheduler.securityContext.runAsUser= to a user id allowed in the namespace on openshift, which is not 65530 ... it has no effect on the gid: field ... Why has the related PR been closed and never merged in September 2022?

thesuperzapper commented 1 year ago

@shalberd I am a little confused, can you clarify a few things:

  1. What do you mean you "were not able to override it" for the git-sync gid of 65533, the value dags.gitSync.image.gid should let you change it.
  2. Can you explain what you mean by this being "bad practice"?
  3. Also, did you see that we have the global default airflow.defaultSecurityContext which sets fsGroup = 0 for compatibility with the default airflow image entry point (but can be changed to anything, if needed)?
  4. I think I missed https://github.com/airflow-helm/charts/pull/624 which was closed automatically due to inactivity, I have reopened it, but if you want to make a separate PR that meets your needs feel free to.

PS: I am only now getting back into the swing of this chart (I was busy moving countries of the last few months!), I appreciate any help or even just explanations of problems you face, especially on Openshift, as I don't have access to an Openshift cluster.

shalberd commented 1 year ago

@thesuperzapper ok, I tried setting dags.gitSync.image.gid=xxxx, but getting an errror

Error: values don't meet the specifications of the schema(s) in the following chart(s):
airflow:
- dags.gitSync: Additional property image is not allowed

did that change to gid directly, or securityContext? values.yaml part

    containerName: git-sync
    uid: 65533

    # When not set, the values defined in the global securityContext will be used
    securityContext: {}
    #  runAsUser: 65533

Chart: 1.9.0 I will post a solution here, should be doable. Am working with Openshift 4.10