argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.05k stars 3.2k forks source link

`resource` template getting duplicate `exec-sa-token` volume mounts with `automountServiceAccountToken: false` #12848

Closed philBrown closed 1 week ago

philBrown commented 7 months ago

Pre-requisites

What happened/what did you expect to happen?

Running a workflow of workflows with automountServiceAccountToken: false results in a repeated exec-sa-token volume mount in the child pod spec.

When trying to run the child workflow, the pod spec validation fails with

Pod "my-workflow-id-sub-id" is invalid: spec.containers[1].volumeMounts[1].mountPath: Invalid value: "/var/run/secrets/kubernetes.io/serviceaccount": must be unique

It appears that the code in workflowpod.go is running multiple times for the same container in these cases.

Version

v3.5.2

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

---
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: workflow-template-submittable
spec:
  automountServiceAccountToken: false # 👈
  executor:
    serviceAccountName: workflow-executor
  entrypoint: whalesay-template
  arguments:
    parameters:
      - name: message
        value: hello world
  templates:
    - name: whalesay-template
      inputs:
        parameters:
          - name: message
      container:
        image: docker/whalesay
        command: [cowsay]
        args: ["{{inputs.parameters.message}}"]

---
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: workflow-of-workflows-
spec:
  automountServiceAccountToken: false # 👈
  executor:
    serviceAccountName: workflow-executor
  entrypoint: main
  templates:
    - name: main
      steps:
        - - name: workflow1
            template: resource-without-argument
            arguments:
              parameters:
              - name: workflowtemplate
                value: "workflow-template-submittable"

    - name: resource-without-argument
      inputs:
        parameters:
          - name: workflowtemplate
      resource:
        action: create
        manifest: |
          apiVersion: argoproj.io/v1alpha1
          kind: Workflow
          metadata:
            generateName: workflow-of-workflows-1-
          spec:
            workflowTemplateRef:
              name: {{inputs.parameters.workflowtemplate}}
        successCondition: status.phase == Succeeded
        failureCondition: status.phase in (Failed, Error)

Logs from the workflow controller

{
  "level": "warning",
  "msg": "Non-transient error: Pod \"my-workflow-id-sub-id\" is invalid: spec.containers[1].volumeMounts[1].mountPath: Invalid value: \"/var/run/secrets/kubernetes.io/serviceaccount\": must be unique",
  "time": "..."
}

Logs from in your workflow's wait container

n/a
eduardodbr commented 7 months ago

Hi @philBrown,

The issue is not the workflow of workflows. The issue is being caused by using a resource template with automountServiceAccountToken: false. The following (simpler) workflow would throw the same error:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: workflow-resource
  namespace: argo
spec:
  automountServiceAccountToken: false
  executor:
    serviceAccountName: default
  entrypoint: resource-create
  arguments:
    parameters:
      - name: message
        value: hello
  templates:
    - name: resource-create
      inputs:
        parameters:
          - name: message
      resource:
        action: create
        manifest: |
          apiVersion: v1
          kind: ServiceAccount
          metadata:
            name: {{inputs.parameters.message}}

Currently, the executor service account is always mounted in every container (init,wait,main) when automountServiceAccountToken is set to false. When the workflow has a step that uses the template type resource or data it generates the container spec using the newExecContainer() func that also mounts the executor service account into the main container, leading to the duplicate volume mount.

This can be fixed, but I think we should discuss it first. To understand it better, I'll give a few examples:

Init, Wait and Main will use the any SA with:

spec:
  serviceAccountName: any

Init and Wait are mounted with workflow-executorSA while main uses anywith:

spec:
  executor:
    serviceAccountName: workflow-executor
  serviceAccountName: any

Init, Wait and Main will use workflow-executor SA with:

spec:
  executor:
    serviceAccountName: workflow-executor
  serviceAccountName: any
  automountServiceAccountToken: false

IMO this PR https://github.com/argoproj/argo-workflows/pull/10945 introduced a regression. Before that change, configuring automountServiceAccountToken: false would create the main container without any SA, which I believe was the intended behavior. I believe the issue https://github.com/argoproj/argo-workflows/issues/10937 that change was trying to solve, could have been solved using a resource template (that would use the executor SA) or using a serviceAccountName with appropriate RBAC.

I'll wait for the input of the maintainers to suggest some changes.

philBrown commented 7 months ago

Thanks for the clarification @eduardodbr

I found the error was the same whether or not the child had defined automountServiceAccountToken with true, false or not present at all

awprice commented 4 months ago

We're also running into this issue as well when using automountServiceAccountToken: false - we rely on the service account mounting behaviour present in v3.4.x of Argo workflows and at the moment upgrading to v3.5.x is not feasible as we do not want the executor service account mounted into the main container.

Is it possible for this regression in behaviour reverted in a future release?

I agree with @eduardodbr in that the issue #10937 was trying to solve could have been solved differently, rather than changing the automountServiceAccountToken behaviour.

agilgur5 commented 2 weeks ago

Yea @eduardodbr 's description above looks correct to me, so #10937 was not a bug and #10945 is itself a bug/regression then. I've renamed the issue and PR so it reads as incorrect at a glance -- if automountServiceAccountToken: false, ofc an SA should not be mounted to the main container as you've literally specified not to.

we rely on the service account mounting behaviour present in v3.4.x of Argo workflows and at the moment upgrading to v3.5.x is not feasible

It looks like #10945 was backported to 3.4.8+ per #11118

Is it possible for this regression in behaviour reverted in a future release?

Yes it should be IMO. I've labeled it as a regression and part of a patch series milestone.