argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

Unable to run a simple HTTP template: `Error: unknown (get workflowtasksets.argoproj.io)` #13770

Closed julienteisseire closed 1 month ago

julienteisseire commented 1 month ago

Pre-requisites

What happened? What did you expect to happen?

Hello, I am trying HTTP template with the example workflow. Agent executor is in Crashloopbackoff after submitting my simple workflow. I work in a dedicated namespace : commanding

I checked service account, agent role and role bindings. I don't have a 403 error, but only an error in agent usage.

My kubectl status :

commanding           workflow-engine-argo-workflows-server-9596d7666-vd7n2             1/1     Running            0               41m
commanding           workflow-engine-argo-workflows-workflow-controller-7cffff9dcbxx   1/1     Running            0               41m
commanding           http-template-26wpj-1340600742-agent                              0/1     CrashLoopBackOff   5 (2m ago)      5m13s
commanding           http-template-8rnbl-1340600742-agent                              0/1     CrashLoopBackOff   1 (2s ago)      12s
commanding           http-template-bhj8t-1340600742-agent                              0/1     CrashLoopBackOff   7 (35s ago)     11m
commanding           http-template-ng6cn-1340600742-agent                              0/1     CrashLoopBackOff   9 (106s ago)    23m
commanding           http-template-s46mp-1340600742-agent                              0/1     CrashLoopBackOff   6 (2m43s ago)   8m52s

My error :

kubectl logs http-template-8rnbl-1340600742-agent -n commanding
time="2024-10-16T15:22:45.620Z" level=info msg="Starting Workflow Executor" version=v3.5.11
time="2024-10-16T15:22:45.710Z" level=info msg="Starting Agent" requeueTime=10s taskWorkers=16 workflow=http-template-8rnbl
Error: unknown (get workflowtasksets.argoproj.io)
Usage:
  argoexec agent main [flags]

Flags:
  -h, --help   help for main

Global Flags:
      --as string                      Username to impersonate for the operation
      --as-group stringArray           Group to impersonate for the operation, this flag can be repeated to specify multiple groups.
      --as-uid string                  UID to impersonate for the operation
      --certificate-authority string   Path to a cert file for the certificate authority
      --client-certificate string      Path to a client certificate file for TLS
      --client-key string              Path to a client key file for TLS
      --cluster string                 The name of the kubeconfig cluster to use
      --context string                 The name of the kubeconfig context to use
      --gloglevel int                  Set the glog logging level
      --insecure-skip-tls-verify       If true, the server's certificate will not be checked for validity. This will make your HTTPS connections insecure
      --kubeconfig string              Path to a kube config. Only required if out-of-cluster
      --log-format string              The formatter to use for logs. One of: text|json (default "text")
      --loglevel string                Set the logging level. One of: debug|info|warn|error (default "info")
  -n, --namespace string               If present, the namespace scope for this CLI request
      --password string                Password for basic authentication to the API server
      --proxy-url string               If provided, this URL will be used to connect via proxy
      --request-timeout string         The length of time to wait before giving up on a single server request. Non-zero values should contain a corresponding time unit (e.g. 1s, 2m, 3h). A value of zero means don't timeout requests. (default "0")
      --server string                  The address and port of the Kubernetes API server
      --tls-server-name string         If provided, this name will be used to validate server certificate. If this is not provided, hostname used to contact the server is used.
      --token string                   Bearer token for authentication to the API server
      --user string                    The name of the kubeconfig user to use
      --username string                Username for basic authentication to the API server

unknown (get workflowtasksets.argoproj.io)

I tried everything in order to solve this issue, but I don't understand what is the cause of the error. NB : other workflows not using http template are working fine

Thank you

Version(s)

v3.5.11

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: http-template-
  namespace: commanding
  labels:
    workflows.argoproj.io/test: "true"
  annotations:
    workflows.argoproj.io/description: |
      Http template will demostrate http template functionality
    workflows.argoproj.io/version: '>= 3.2.0'
spec:
  entrypoint: main
  serviceAccountName: argo-workflow
  templates:
    - name: main
      steps:
        - - name: good
            template: http
            arguments:
              parameters: [{name: url, value: "https://raw.githubusercontent.com/argoproj/argo-workflows/4e450e250168e6b4d51a126b784e90b11a0162bc/pkg/apis/workflow/v1alpha1/generated.swagger.json"}]
          - name: bad
            template: http
            continueOn:
              failed: true
            arguments:
              parameters: [{name: url, value: "https://raw.githubusercontent.com/argoproj/argo-workflows/thisisnotahash/pkg/apis/workflow/v1alpha1/generated.swagger.json"}]

    - name: http
      inputs:
        parameters:
          - name: url
      http:
       # url: http://dummy.restapiexample.com/api/v1/employees
       url: "{{inputs.parameters.url}}"

Logs from the workflow controller

time="2024-10-16T15:30:09.799Z" level=info msg="Creating TaskSet" namespace=commanding workflow=http-template-ng6cn
time="2024-10-16T15:30:09.812Z" level=info msg=reconcileAgentPod namespace=commanding workflow=http-template-ng6cn
time="2024-10-16T15:30:09.812Z" level=info msg=updateAgentPodStatus namespace=commanding workflow=http-template-ng6cn
time="2024-10-16T15:30:09.812Z" level=info msg=assessAgentPodStatus namespace=commanding podName=http-template-ng6cn-1340600742-agent
time="2024-10-16T15:31:14.797Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=3296 namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.798Z" level=info msg="Task-result reconciliation" namespace=commanding numObjs=0 workflow=http-template-bhj8t
time="2024-10-16T15:31:14.798Z" level=info msg=updateAgentPodStatus namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.798Z" level=info msg=assessAgentPodStatus namespace=commanding podName=http-template-bhj8t-1340600742-agent
time="2024-10-16T15:31:14.798Z" level=error msg="was unable to obtain node for http-template-bhj8t-2166136261" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.799Z" level=info msg="Workflow step group node http-template-bhj8t-4112562815 not yet completed" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.799Z" level=info msg="TaskSet Reconciliation" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.799Z" level=info msg="Creating TaskSet" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.817Z" level=info msg=reconcileAgentPod namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.817Z" level=info msg=updateAgentPodStatus namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:14.817Z" level=info msg=assessAgentPodStatus namespace=commanding podName=http-template-bhj8t-1340600742-agent
time="2024-10-16T15:31:29.742Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=3296 namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.743Z" level=info msg="Task-result reconciliation" namespace=commanding numObjs=0 workflow=http-template-bhj8t
time="2024-10-16T15:31:29.744Z" level=info msg=updateAgentPodStatus namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.745Z" level=info msg=assessAgentPodStatus namespace=commanding podName=http-template-bhj8t-1340600742-agent
time="2024-10-16T15:31:29.745Z" level=error msg="was unable to obtain node for http-template-bhj8t-2166136261" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.746Z" level=info msg="Workflow step group node http-template-bhj8t-4112562815 not yet completed" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.746Z" level=info msg="TaskSet Reconciliation" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.747Z" level=info msg="Creating TaskSet" namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.761Z" level=info msg=reconcileAgentPod namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.761Z" level=info msg=updateAgentPodStatus namespace=commanding workflow=http-template-bhj8t
time="2024-10-16T15:31:29.761Z" level=info msg=assessAgentPodStatus namespace=commanding podName=http-template-bhj8t-1340600742-agent

Logs from in your workflow's wait container

error: container wait is not valid for pod http-template-bhj8t-1340600742-agent
jswxstw commented 1 month ago

I checked service account, agent role and role bindings.

Have you checked the RBAC config in namespace commanding? argo-workflow in commanding does not have permission to access workflowtasksets.

julienteisseire commented 1 month ago

I believed regarding role description that permissions have been set properly.

# kubectl get role workflow-engine-argo-workflows-workflow -n commanding -o yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  annotations:
    meta.helm.sh/release-name: workflow-engine
    meta.helm.sh/release-namespace: commanding
  creationTimestamp: "2024-10-17T07:04:13Z"
  labels:
    app: workflow-controller
    app.kubernetes.io/component: workflow-controller
    app.kubernetes.io/instance: workflow-engine
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: argo-workflows-workflow-controller
    app.kubernetes.io/part-of: argo-workflows
    helm.sh/chart: argo-workflows-0.42.5
  name: workflow-engine-argo-workflows-workflow
  namespace: commanding
  resourceVersion: "517"
  uid: 19600453-b936-4301-b278-79fc685f365b
rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - watch
  - patch
- apiGroups:
  - ""
  resources:
  - pods/log
  verbs:
  - get
  - watch
- apiGroups:
  - ""
  resources:
  - pods/exec
  verbs:
  - create
- apiGroups:
  - argoproj.io
  resources:
  - workflowtaskresults
  verbs:
  - create
  - patch
- apiGroups:
  - argoproj.io
  resources:
  - workflowtasksets
  - workflowartifactgctasks
  verbs:
  - list
  - watch
- apiGroups:
  - argoproj.io
  resources:
  - workflowtasksets/status
  - workflowartifactgctasks/status
  verbs:
  - patch

Is there any problem in the role definition ?

Do I need to add permission here in addition to list and watchfor workflowtasksets ? It is the default configuration after deployment using helm chart and specifying namespaced and commanding as main namespace.

- apiGroups:
  - argoproj.io
  resources:
  - workflowtaskresults
  verbs:
  - create
  - patch
- apiGroups:
  - argoproj.io
  resources:
  - workflowtasksets
  - workflowartifactgctasks
  verbs:
  - list
  - watch
- apiGroups:
  - argoproj.io
  resources:
  - workflowtasksets/status
  - workflowartifactgctasks/status
  verbs:
  - patch

Thank you

jswxstw commented 1 month ago

The role definition seems fine. Is there a correct rolebinding for argo-workflow?

julienteisseire commented 1 month ago

Yes I guess

# kubectl get rolebindings workflow-engine-argo-workflows-workflow -n commanding -o yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  annotations:
    meta.helm.sh/release-name: workflow-engine
    meta.helm.sh/release-namespace: commanding
  creationTimestamp: "2024-10-17T08:17:44Z"
  labels:
    app: workflow-controller
    app.kubernetes.io/component: workflow-controller
    app.kubernetes.io/instance: workflow-engine
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: argo-workflows-workflow-controller
    app.kubernetes.io/part-of: argo-workflows
    helm.sh/chart: argo-workflows-0.42.5
  name: workflow-engine-argo-workflows-workflow
  namespace: commanding
  resourceVersion: "491"
  uid: fa683bed-3cb2-467b-a9be-fde121e1d9db
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: workflow-engine-argo-workflows-workflow
subjects:
- kind: ServiceAccount
  name: argo-workflow
  namespace: commanding
julienteisseire commented 1 month ago

Here is my values.yaml for information :

workflow:
  serviceAccount:
    create: true
    name: "argo-workflow"
  rbac:
    create: true
server:
  authModes: [server]
  secure: true
  extraArgs:
    - --namespaced
    - --managed-namespace
    - commanding
controller:
  extraArgs:
    - --namespaced
    - --managed-namespace
    - commanding
  workflowNamespaces:
    - commanding
    - operations
    - processing
  configMap:
    # -- Create a ConfigMap for the controller
    create: false

But I also tried with basic setup (following installation here https://artifacthub.io/packages/helm/argo/argo-workflows ) with same error. I try to understand why we have 4 serviceaccounts,

kubectl get serviceaccounts -n commanding

NAME                                                 SECRETS   AGE
argo-workflow                                        0         21m
default                                              0         21m
workflow-engine-argo-workflows-server                0         21m
workflow-engine-argo-workflows-workflow-controller   0         21m

which one executor is using and what should be the error conducting in this situation. Don't see.

jswxstw commented 1 month ago

Can you execute the following commands to check if the permissions are valid.

kubectl auth can-i list workflowtasksets --as=system:serviceaccount:commanding:argo-workflow -n commanding
kubectl auth can-i watch workflowtasksets --as=system:serviceaccount:commanding:argo-workflow -n commanding

Also, can you provide the detail yaml of the agent pod?

julienteisseire commented 1 month ago

Of course.

kubectl auth can-i list workflowtasksets --as=system:serviceaccount:commanding:argo-workflow -n commanding
yes
kubectl auth can-i watch workflowtasksets --as=system:serviceaccount:commanding:argo-workflow -n commanding

yes

For the yaml detail of POD, which one ? the http-template ? If yes, please find it below :

```yaml # kubectl get pod http-template-6xlkw-1340600742-agent -n commanding -o yaml apiVersion: v1 kind: Pod metadata: annotations: kubectl.kubernetes.io/default-container: main creationTimestamp: "2024-10-17T09:50:36Z" labels: workflows.argoproj.io/completed: "false" workflows.argoproj.io/component: agent workflows.argoproj.io/workflow: http-template-6xlkw name: http-template-6xlkw-1340600742-agent namespace: commanding ownerReferences: - apiVersion: argoproj.io/v1alpha1 blockOwnerDeletion: true controller: true kind: Workflow name: http-template-6xlkw uid: 1eba29ed-e087-412f-8985-dc6b5020c8b1 resourceVersion: "6924" uid: 9aeb1b35-5674-44b9-a2c6-ae5fe5bfca50 spec: automountServiceAccountToken: false containers: - args: - agent - main - --loglevel - info command: - argoexec env: - name: ARGO_WORKFLOW_NAME value: http-template-6xlkw - name: ARGO_WORKFLOW_UID value: 1eba29ed-e087-412f-8985-dc6b5020c8b1 - name: ARGO_AGENT_PATCH_RATE value: 10s - name: ARGO_PLUGIN_ADDRESSES value: "null" - name: ARGO_PLUGIN_NAMES value: "null" image: quay.io/argoproj/argoexec:v3.5.11 imagePullPolicy: IfNotPresent name: main resources: limits: cpu: 100m memory: 256M requests: cpu: 10m memory: 64M securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 8737 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-4jjrj readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true initContainers: - args: - agent - init - --loglevel - info command: - argoexec env: - name: ARGO_WORKFLOW_NAME value: http-template-6xlkw - name: ARGO_WORKFLOW_UID value: 1eba29ed-e087-412f-8985-dc6b5020c8b1 - name: ARGO_AGENT_PATCH_RATE value: 10s - name: ARGO_PLUGIN_ADDRESSES value: "null" - name: ARGO_PLUGIN_NAMES value: "null" image: quay.io/argoproj/argoexec:v3.5.11 imagePullPolicy: IfNotPresent name: init resources: limits: cpu: 100m memory: 256M requests: cpu: 10m memory: 64M securityContext: allowPrivilegeEscalation: false capabilities: drop: - ALL readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 8737 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-4jjrj readOnly: true nodeName: swo-control-plane preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: OnFailure schedulerName: default-scheduler securityContext: runAsNonRoot: true runAsUser: 8737 serviceAccount: argo-workflow serviceAccountName: argo-workflow terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 volumes: - emptyDir: {} name: var-run-argo - name: kube-api-access-4jjrj secret: defaultMode: 420 secretName: argo-workflow.service-account-token status: conditions: - lastProbeTime: null lastTransitionTime: "2024-10-17T09:50:44Z" status: "True" type: PodReadyToStartContainers - lastProbeTime: null lastTransitionTime: "2024-10-17T09:50:46Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2024-10-17T09:51:11Z" message: 'containers with unready status: [main]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2024-10-17T09:51:11Z" message: 'containers with unready status: [main]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2024-10-17T09:50:36Z" status: "True" type: PodScheduled containerStatuses: - containerID: containerd://79836572e8d739c29a160e63de95038c73a38e39681d29f9bd7ac83c53ea0b0c image: quay.io/argoproj/argoexec:v3.5.11 imageID: quay.io/argoproj/argoexec@sha256:4a576a3fe37bf8351117d00be6febf2a93f70840736469be2eeb7c21a6b368e0 lastState: terminated: containerID: containerd://3c07a714ffae2b7c3011322fcdc2195e0543f7fe8190c78f95117b42a794bd7a exitCode: 64 finishedAt: "2024-10-17T09:50:52Z" message: unknown (get workflowtasksets.argoproj.io) reason: Error startedAt: "2024-10-17T09:50:50Z" name: main ready: false restartCount: 2 started: false state: terminated: containerID: containerd://79836572e8d739c29a160e63de95038c73a38e39681d29f9bd7ac83c53ea0b0c exitCode: 64 finishedAt: "2024-10-17T09:51:10Z" message: unknown (get workflowtasksets.argoproj.io) reason: Error startedAt: "2024-10-17T09:51:08Z" volumeMounts: - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-4jjrj readOnly: true recursiveReadOnly: Disabled hostIP: 192.168.112.2 hostIPs: - ip: 192.168.112.2 initContainerStatuses: - containerID: containerd://caf300e9dd45bc9bfa603062b3d9188ba067c5bc13164c04f8d8e9fa7543612f image: quay.io/argoproj/argoexec:v3.5.11 imageID: quay.io/argoproj/argoexec@sha256:4a576a3fe37bf8351117d00be6febf2a93f70840736469be2eeb7c21a6b368e0 lastState: {} name: init ready: true restartCount: 0 started: false state: terminated: containerID: containerd://caf300e9dd45bc9bfa603062b3d9188ba067c5bc13164c04f8d8e9fa7543612f exitCode: 0 finishedAt: "2024-10-17T09:50:46Z" reason: Completed startedAt: "2024-10-17T09:50:44Z" volumeMounts: - mountPath: /var/run/argo name: var-run-argo - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-4jjrj readOnly: true recursiveReadOnly: Disabled phase: Running podIP: 10.244.0.9 podIPs: - ip: 10.244.0.9 qosClass: Burstable startTime: "2024-10-17T09:50:36Z" ```
jswxstw commented 1 month ago

I don't have a 403 error, but only an error in agent usage.

This error log is in debug mode, you should see logs like: Watch workflowtasksets 403 if debug mode is enabled.

However, all RBAC configs look good to me, so weird🤔.

julienteisseire commented 1 month ago

Indeed, I just activated log debug and see the 403 error :

time="2024-10-17T12:03:37.725Z" level=info msg="Starting Workflow Executor" version=v3.5.11
time="2024-10-17T12:03:37.726Z" level=info msg="Starting Agent" requeueTime=10s taskWorkers=16 workflow=http-template-78cqr
time="2024-10-17T12:03:37.827Z" level=debug msg="Watch workflowtasksets 403"
Error: unknown (get workflowtasksets.argoproj.io)

But I don't understand why ...

julienteisseire commented 1 month ago

Maybe an idea ... Since the beginning of my test, I have to create serviceaccount token in the correct namespace for the http pod to init (post install script of helm chart) :

apiVersion: v1
kind: Secret
metadata:
  name: argo-workflow.service-account-token
  namespace: commanding
  annotations:
    kubernetes.io/service-account.name: default
type: kubernetes.io/service-account-token

Once I apply this serviceaccount token, I can init and then fall in error 403 ... But I don't know if the serviceaccount token creation is normal and why do I have to do it manually after argoworkflow installation ? Maybe it could be the root cause of the problem ?

jswxstw commented 1 month ago

Shouldn't secret argo-workflow.service-account-token be automatically generated by k8s when creating service account argo-workflow?

apiVersion: v1
kind: Secret
metadata:
  name: argo-workflow.service-account-token
  namespace: commanding
  annotations:
    kubernetes.io/service-account.name: default # Why do you set the service account name to "default"? Should't it be "argo-workflow"?
type: kubernetes.io/service-account-token
julienteisseire commented 1 month ago

I agree with you, I don't understand why I have to create serviceaccount token manually (I discovered this error from pod describe). If you have an idea for the token to be created automatically .. I'd appreciate.

But in any case, since I updated : kubernetes.io/service-account.name: default to kubernetes.io/service-account.name: argo-workflow

http executor is working fine.

STEP                    TEMPLATE  PODNAME  DURATION  MESSAGE
 ✔ http-template-4qmlz  main
 └─┬─✖ bad              http                         received non-2xx response code: 404
   └─✔ good             http

I thank you a lot for your priceless help.

jswxstw commented 1 month ago

If you have an idea for the token to be created automatically .. I'd appreciate.

Maybe the latest version disabled this feature, it also not works in my k3s cluster.