argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.89k stars 3.17k forks source link

`AgentPod` misses serviceaccounts of plugin in controller namespace #12708

Open jswxstw opened 6 months ago

jswxstw commented 6 months ago

Pre-requisites

What happened/what did you expect to happen?

Related discussion: https://github.com/argoproj/argo-workflows/discussions/12566 Controller runs in namespace: argo, there is a plugin named khaos-executor-plugin and a serviceaccount named khaos-executor-plugin. The sample workflow runs in namespace: khaos-workflow, there is a plugin named hello-executor-plugin and a serviceaccount named hello-executor-plugin.

argo get hello-7t78h -n khaos-workflow
Name:                hello-7t78h
Namespace:           khaos-workflow
ServiceAccount:      unset (will run with the default ServiceAccount)
Status:              Error
Message:             serviceaccounts "khaos-executor-plugin" not found
Conditions:
 Completed           True
 PodRunning          False
Created:             Wed Feb 28 17:32:52 +0800 (2 hours ago)
Started:             Wed Feb 28 17:39:01 +0800 (2 hours ago)
Finished:            Wed Feb 28 17:39:01 +0800 (2 hours ago)
Duration:            0 seconds
Progress:            0/1

STEP            TEMPLATE  PODNAME  DURATION  MESSAGE
 ◷ hello-7t78h  main

Version

v3.4.14

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-
spec:
  entrypoint: main
  templates:
    - name: main
      plugin:
        hello: { }

Logs from the workflow controller

time="2024-02-28T09:39:01.368Z" level=info msg="Task-result reconciliation" namespace=khaos-workflow numObjs=0 workflow=hello-7t78h
time="2024-02-28T09:39:01.368Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.368Z" level=warning msg="[DEBUG] boundaryID was nil" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.368Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.368Z" level=info msg="Plugin node hello-7t78h initialized Pending" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.368Z" level=info msg="TaskSet Reconciliation" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.368Z" level=info msg="Creating TaskSet" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.438Z" level=info msg=reconcileAgentPod namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.458Z" level=error msg="error in agent pod reconciliation" error="serviceaccounts \"khaos-executor-plugin\" not found" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.458Z" level=info msg="Updated phase Running -> Error" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.458Z" level=info msg="Updated message  -> serviceaccounts \"khaos-executor-plugin\" not found" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.458Z" level=info msg="Marking workflow completed" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.458Z" level=info msg="Marking workflow as pending archiving" namespace=khaos-workflow workflow=hello-7t78h
time="2024-02-28T09:39:01.458Z" level=info msg="Workflow to be dehydrated" Workflow Size=1516
time="2024-02-28T09:39:01.466Z" level=info msg="cleaning up pod" action=deletePod key=khaos-workflow/hello-7t78h-1340600742-agent/deletePod
time="2024-02-28T09:39:01.468Z" level=info msg="Workflow update successful" namespace=khaos-workflow phase=Error resourceVersion=85399559051 workflow=hello-7t78h
time="2024-02-28T09:39:01.491Z" level=info msg="archiving workflow" namespace=khaos-workflow uid=5c4aa38d-4d94-4def-b189-b6908d3133e4 workflow=hello-7t78h
time="2024-02-28T09:39:01.513Z" level=info msg="Queueing Error workflow khaos-workflow/hello-7t78h for delete in 72h0m0s due to TTL"
time="2024-02-28T09:39:56.542Z" level=info msg="Alloc=17143 TotalAlloc=4920922 Sys=61565 NumGC=947 Goroutines=282"
time="2024-02-28T09:40:17.314Z" level=info msg="Performing periodic workflow GC"
time="2024-02-28T09:40:17.316Z" level=info msg="Deleting old offloads that are not live" len_wfs=0
time="2024-02-28T09:40:17.316Z" level=info msg="Workflow GC finished"

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
jswxstw commented 6 months ago

As mentioned in https://github.com/argoproj/argo-workflows/discussions/12566, I think AgentPod just needs plugins in workflow namespace, plugins in controller namespace are useless and should not be loaded by default.

jswxstw commented 6 months ago

@alexec Do you know why plugins in controller namespace are needed?

alexec commented 6 months ago

A plugin outside the controller namespace is a user plugin. In many set ups, the users cannot modify the controller because they only have access to their own namespace. Loading plugins from the user’s namespace allows the user to self-serve their plugins. I’m not sure a user plugin should try and use a controller service account, so this seems to be a bug.

jswxstw commented 6 months ago

I’m not sure a user plugin should try and use a controller service account, so this seems to be a bug.

Controller plugins are loaded by default, so controller service account will be accessed if AutomountServiceAccountToken is true. https://github.com/argoproj/argo-workflows/blob/af2cacb365a6cc03cc35ed9749976e095f9a03f7/workflow/controller/agent.go#L273-L302 I'm not sure why controller plugins are needed for user workflow. Do you know why?

alexec commented 2 weeks ago

This seems odd to me too. Agents are part of the workflow, not part of the control-plane, so should run using the service account of the workflow.

jswxstw commented 4 days ago

This seems odd to me too. Agents are part of the workflow, not part of the control-plane, so should run using the service account of the workflow.

@alexec I see that you support the feature of executor plugin, executor plugins in control-plane are loaded by default, so I wonder why you do this or can you confirm if we can only load plugins in user‘s workflow namespace with this PR:#12724.

https://github.com/argoproj/argo-workflows/blob/889a9d24be072c5d04e853db3d4c40a04f939d28/workflow/controller/agent.go#L270