argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.98k stars 3.19k forks source link

Resource template doesn't work if readOnlyRootFileSystem is enforced #10787

Open sujaykulkarn opened 1 year ago

sujaykulkarn commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

Hi,

Argo workflow chart configurations: For Controller, the main container, and wait container(executor) we are enforcing readOnlyRootFileSystem flag for security context. Other sample workflows which have container, and script template all worked fine. Only the Resource template is failing with the below logs.

time="2023-03-29T10:41:13.128Z" level=error msg="executor error: open /tmp/manifest.yaml: read-only file system"
 time="2023-03-29T10:41:13.128Z" level=fatal msg="open /tmp/manifest.yaml: read-only file system"

Workflow - https://github.com/argoproj/argo-workflows/blob/master/examples/k8s-owner-reference.yaml ) Reference - https://blog.argoproj.io/practical-argo-workflows-hardening-dd8429acc1ce

Is there any workaround to make resource template work when allowPrivilegeEscalation when set to false. Thanks

Version

latest

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: k8s-owner-reference-
  labels:
    workflows.argoproj.io/test: "true"
  annotations:
    workflows.argoproj.io/description: |
      This example creates a Kubernetes resource that will be deleted
      when the workflow is deleted via Kubernetes GC.

      A workflow is used for this example, but the same approach would apply
      to other resource types.

      https://kubernetes.io/docs/concepts/workloads/controllers/garbage-collection/
spec:
  entrypoint: main
  templates:
    - name: main
      resource:
        action: create
        setOwnerReference: true
        manifest: |
          apiVersion: argoproj.io/v1alpha1
          kind: Workflow
          metadata:
            generateName: owned-eg-
          spec:
            entrypoint: main
            templates:
              - name: main
                container:
                  image: argoproj/argosay:v2

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
time="2023-03-29T10:41:13.128Z" level=error msg="executor error: open /tmp/manifest.yaml: read-only file system"
 time="2023-03-29T10:41:13.128Z" level=fatal msg="open /tmp/manifest.yaml: read-only file system"

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

time="2023-03-29T10:41:13.128Z" level=error msg="executor error: open /tmp/manifest.yaml: read-only file system"
 time="2023-03-29T10:41:13.128Z" level=fatal msg="open /tmp/manifest.yaml: read-only file system"
tico24 commented 1 year ago

The path in the logs is defined here: https://github.com/argoproj/argo-workflows/blob/master/workflow/common/common.go#L112

It looks like the manifest is written to a file before being kubectl apply'd to the cluster. I'm not useful at a code level so can't provide a fix, but hopefully this comment helps whoever picks it up.

poornachandratejasvi commented 1 year ago

facing similar program , any solutions to this problem that don't involve altering the flag readOnlyRootFileSystem ?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

agilgur5 commented 1 year ago

Ah I mentioned this scenario in #10233 too!

podSpecPatch may be a workaround for this -- would need to mount an emptyDir as a replacement for the root FS write.

ramanNarasimhan77 commented 6 months ago

Ran into this issue today and noticed that if I specify an emptyDir volume in Workflow's spec.volumes and use it in podSpecPatch it does not work. The generated podSpec does not have the new volume added, so the pod definition gets broken as it tries to mount a non-existent volume.

To circumvent above issue, I added volume directly in podSpecPatch and was able to get it to work.

podSpecPatch: '{"initContainers":[{"name":"init","volumeMounts":[{"name":"init-temp-dir","mountPath":"/tmp"}]}],"volumes":[{"name":"init-temp-dir","emptyDir":{"sizeLimit":"50Mi"}}]}'
agilgur5 commented 5 months ago

I think the fix to this should be to just use an emptyDir volume for resource templates instead of requiring the ability to write to the root FS.

There actually already is an emptyDir volume created for Workflow Pods, but that one seems to have a subPath. We should re-use that one if possible. Could also potentially follow the same logic that script templates do

ramanNarasimhan77 commented 1 month ago

In our Kubernetes clusters, recently we have introduced Validating Admission Policies. One of the policies, stops the submission of a Workflow if it contains a podSpecPatch because a malicious user can inject any container which can be used to trigger attacks like reverse-shell, host-takeover etc provided the workflow is running with a service account having many privileges.

As a result, currently we are unable to use Manifest submission via Workflows as readOnlyRootFileSystem is also enforced for all pods via a securityContext.

So, we are waiting for this fix to be done so that we can resume using Manifest submission via workflows.

I think the fix to this should be to just use an emptyDir volume for resource templates instead of requiring the ability to write to the root FS.

agilgur5 commented 1 month ago

One of the policies, stops the submission of a Workflow if it contains a podSpecPatch because a malicious user can inject any container which can be used to trigger attacks like reverse-shell, host-takeover etc

If the user has RBAC to create Workflows or create Pods, they can already write any malicious logic directly and would not need podSpecPatch to do so. Without more context, that policy rationale sounds inherently flawed

divramod commented 1 month ago

so is the resource template definition not working currently? i am just starting with argo workflows and testing the different template options.

agilgur5 commented 1 month ago

No, this issue does not say that (and has been open for 1.5 years) and it works fine. As the issue says, certain policies may disallow its usage without some of the workarounds mentioned above.