flyteorg / flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://flyte.org
Apache License 2.0
5.83k stars 659 forks source link

More flexible configuration of SecurityContext for Pods/Containers started by flyte #5462

Open vlibov opened 5 months ago

vlibov commented 5 months ago

Motivation: Why do you think this is important?

To run workloads in a security-hardenend kubernetes cluster, the pods and containers must fullfill certain requirements, in particular regarding the SecurityContext fields (pod level, container level).

It is possible to set those in flyte e.g. by utilizing the PodTemplate native k8s resource. However it seems that these settings won't be applied to the initContainers (flyte-copilot-downloader). Furthermore, there are some settings which are applied to the main container (addéd capability SYS_PTRACE) without a possibility to disable it.

This prevents running flyte workflows in a hardened namespace, which is somewhat unfortunate.

It would be nice to be able to configure those, i.e. apply the PodTemplate also for the initContainers and make the SYS_PTRACE capability optional.

Goal: What should the final outcome look like, ideally?

Ultimately we need to

  1. Be able to disable the SYS_PTRACE capability, i.e. make it optional in the config
  2. Apply PodTemplates also to the initContainers.

Describe alternatives you've considered

One alternative is to run flyte-workloads in a separate namespace which has more elevated rights (i.e. less secure) and restrict the users from creating workloads in the namespace direclty. However this still poses a security risk, because the user still can create workloads indirectly (e.g. create a malicious python code or image and start it as a flyte worklfow).

Propose: Link/Inline OR Additional context

No response

Are you sure this issue hasn't been raised already?

Have you read the Code of Conduct?

welcome[bot] commented 5 months ago

Thank you for opening your first issue here! 🛠

runllm[bot] commented 5 months ago

To address the issue of configuring SecurityContext for Pods/Containers started by Flyte, you can leverage the PodTemplate feature and make specific adjustments to the Flyte configuration. Here are the steps:

  1. Disable SYS_PTRACE Capability: According to this discussion, the SYS_PTRACE capability is required for Flyte Sidecar but can be configured. You can modify the security context to drop this capability.

  2. Apply PodTemplates to InitContainers: As per this GitHub issue, you can configure PodTemplates to apply to all containers, including initContainers. This ensures that the security context settings are uniformly applied.

Here is an example of how you can configure the security context in your PodTemplate:

apiVersion: v1
kind: PodTemplate
metadata:
  name: secure-flyte-pod
template:
  spec:
    securityContext:
      runAsNonRoot: true
      fsGroup: 65534
      runAsUser: 1001
      fsGroupChangePolicy: "Always"
      seLinuxOptions:
        type: spc_t
    containers:
      - name: main-container
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
    initContainers:
      - name: flyte-copilot-downloader
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]

Would you like more details on how to implement these configurations?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @runllm to ask follow-up questions. See runllm.com for more!

vlibov commented 5 months ago

Including a configuration for initContainer in the podtemplate as suggested above leads to a duplication error:

Workflow[kodi:development:workflows.kodi-training.kodi_training_wf] failed. RuntimeExecutionError: max number of system retry attempts [11/10] exhausted. Last known status message: failed at Node[n0]. RuntimeExecutionError: failed during plugin execution, caused by: failed to execute handle for plugin [container]: [Invalid] failed to create resource, caused by: Pod "aqmll4kts4gghgvgqljv-n0-0-n0-n1-0-n0-0" is invalid: spec.initContainers[1].name: Duplicate value: "flyte-copilot-downloader"

Thus, flyte does not check for already existing initContainer definitions in the template and tries to a new one, hence the name conflict.

Setting "default" as the name of initContainer (as for the main containers) will not work, as the names have to be unique in the template.

vlibov commented 5 months ago

https://github.com/vlibov/flyte/tree/podsecurity

davidmirror-ops commented 5 months ago

@vlibov do you plan to contribute the proposed changes?

vlibov commented 5 months ago

Hi @davidmirror-ops, yes!

vlibov commented 5 months ago

The SYS_PTRACE is added in flyteplugins\go\tasks\pluginmachinery\flytek8s\copilot.go Function AddCoPilotToContainer(..) c.SecurityContext.Capabilities.Add = append(c.SecurityContext.Capabilities.Add, pTraceCapability)

If we want to make this configurable - what is the best way of propagating the corresponding setting?

davidmirror-ops commented 5 months ago

@EngHabu do you think, instead of even making it configurable, we could drop the SYS_PTRACE capability from the copilot container?

vlibov commented 5 months ago

Any news on this @davidmirror-ops @EngHabu ? In the meantime I did the necessary changes here in my forked repo: https://github.com/vlibov/flyte/tree/podsecurity (removing SYS_PTRACE altogether - i.e. hardcoding the condition to false and including the SecuritySettings also for initContainers). What is the process from here (if the changes look right) - simply creating a pull request for the official repo?

davidmirror-ops commented 5 months ago

@vlibov feel free to submit the PR, we can then review and discuss. Thanks!!

vlibov commented 4 months ago

@davidmirror-ops Done, see https://github.com/flyteorg/flyte/pull/5556 Looking forward to any comments!