argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.05k stars 3.2k forks source link

node data struct is empty cause panic #10280

Open Sad-polar-bear opened 1 year ago

Sad-polar-bear commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

50% probability because node data structure is empty cause workflow-controller panic

workflow-controller log: msg="Recovered from panic" namespace=default r="runtime error: invalid memory address or nil pointer dereference" stack="goroutine 339 [running]: runtime/debug.Stack() /usr/local/go/src/runtime/debug/stack.go:24 +0x65 github.com/argoproj/argo-workflows/v3/workflow/controller.(wfOperationCtx).operate.func2() /go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:194 +0xd4 panic({0x1d506c0, 0x3432820}) /usr/local/go/src/runtime/panic.go:1047 +0x266 github.com/argoproj/argo-workflows/v3/workflow/controller.(wfOperationCtx).executeTemplate(0xc001af7500, {0x2339ae8, 0xc000058018}, {0xc000dcb900, 0x32}, {0x231b9e0, 0xc0009fdc80}, 0x27, {{0xc000b28700, 0x7, ...}, ...}, ...) /go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1965 +0x32c5 github.com/argoproj/argo-workflows/v3/workflow/controller.(wfOperationCtx).executeStepGroup(0xc001af7500, {0x2339ae8, 0xc000058018}, {0xc0009b5b00, 0x1, 0x4}, {0xc000c17110, 0x28}, 0xc000bd6a28) /go/src/github.com/argoproj/argo-workflows/workflow/controller/steps.go:247 +0x606 github.com/argoproj/argo-workflows/v3/workflow/controller.(wfOperationCtx).executeSteps(0xc001af7500, {0x2339ae8, 0xc000058018}, {0xc00128bc20, 0x25}, 0xc000d56600, {0xc000c165d0, 0x2b}, 0xc000baf440, {0x231b9e0, ...}, ...) /go/src/github.com/argoproj/argo-workflows/workflow/controller/steps.go:95 +0xd05 github.com/argoproj/argo-workflows/v3/workflow/controller.(wfOperationCtx).executeTemplate(0xc001af7500, {0x2339ae8, 0xc000058018}, {0xc00128bc20, 0x25}, {0x231b9e0, 0xc0009fc000}, 0x8c53fa, {{0x0, 0x0, ...}, ...}, ...) /go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:1900 +0x246c github.com/argoproj/argo-workflows/v3/workflow/controller.(wfOperationCtx).operate(0xc001af7500, {0x2339ae8, 0xc000058018}) /go/src/github.com/argoproj/argo-workflows/workflow/controller/operator.go:350 +0x16a8 github.com/argoproj/argo-workflows/v3/workflow/controller.(WorkflowController).processNextItem(0xc000600c00, {0x2339ae8, 0xc000058018}) /go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:756 +0x8ee github.com/argoproj/argo-workflows/v3/workflow/controller.(WorkflowController).runWorker(0x0) /go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:678 +0x9e k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x7f5d20de93c0) /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:155 +0x67 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0, {0x22f6ee0, 0xc000856090}, 0x1, 0xc0005e8ae0) /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:156 +0xb6 k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0, 0x3b9aca00, 0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:133 +0x89 k8s.io/apimachinery/pkg/util/wait.Until(0x0, 0x0, 0x0) /go/pkg/mod/k8s.io/apimachinery@v0.24.3/pkg/util/wait/wait.go:90 +0x25 created by github.com/argoproj/argo-workflows/v3/workflow/controller.(*WorkflowController).Run /go/src/github.com/argoproj/argo-workflows/workflow/controller/controller.go:294 +0x1a6c " workflow=cehr3recdgb6d70or1cg-20221222012111-1

k8s version: v1.25.2

argo version: v3.3.10-v3.4.4 all of version list I have test, in v1.25.2 k8s version, all have this question

Version

v3.3.10-v3.4.4

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

argo v3.3.10-v3.4.4

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep "Recovered from panic"

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
Sad-polar-bear commented 1 year ago

wheather k8s version is too latest?

Sad-polar-bear commented 1 year ago

workflow.yaml:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  annotations:
    workflows.argoproj.io/pod-name-format: v2
  creationTimestamp: "2022-12-27T03:21:33Z"
  generation: 6
  labels:
    workflows.argoproj.io/phase: Running
  name: cehr3recdgb6d70or1cg-20221222012111-2
  namespace: default
  resourceVersion: "28963779"
  uid: b5e5a86a-8f98-4ea8-8cdd-037884101abd
spec:
  arguments: {}
  entrypoint: main
  podMetadata:
    annotations:
      boss.infra.cloudnative.com/yunti-pipeline-zh: 张三20221222
  templates:
  - inputs: {}
    metadata: {}
    name: main
    outputs: {}
    steps:
    - - arguments:
          parameters:
          - name: tag
            value: ""
          - name: oauthtoken
            value: 123
          - name: repo
            value: git.ops.com/new-ops/ops-cloud/k8s-resource-controller.git
          - name: branch
            value: master
        name: gitlab-checkout
        templateRef:
          name: gitlab-checkout
          template: gitlab-checkout-template
    - - arguments:
          parameters:
          - name: projectName
            value: k8s-resource-controller
          - name: sonarQubeAddr
            value: http://10.7.160.240
          - name: sonarQubeToken
            value: 123
          - name: sonarAPIToken
            value: Basic 123
          - name: scanMode
            value: FullAmount
          - name: debug
            value: "on"
          - name: scanPath
            value: k8s-resource-controller
        name: code-scan
        templateRef:
          name: code-scan
          template: code-scan-template
    - - arguments:
          parameters:
          - name: goprivate
            value: git.ops.com
          - name: go111module
            value: "on"
          - name: goproxy
            value: https://goproxy.cn,https://mirrors.aliyun.com/goproxy/,direct
          - name: baseImg
            value: harbor.ops.com/ops/centos-golang:1.18
          - name: path
            value: k8s-resource-controller/cloneset
        name: golang-unittest
        templateRef:
          name: golang-unittest
          template: golang-unittest-template
    - - arguments:
          parameters:
          - name: pipelineID
            value: cehr3recdgb6d70or1cg-20221222012111
          - name: baseImg
            value: harbor.ops.com/ops/centos-golang:1.18
          - name: codeDir
            value: k8s-resource-controller
          - name: goprivate
            value: git.ops.com
          - name: go111module
            value: "on"
          - name: goproxy
            value: https://goproxy.cn,https://mirrors.aliyun.com/goproxy/,direct
          - name: golangCompileScript
            value: |-
              #!/bin/bash

              cd cloneset/cmd/cloneset-controller

              go build -o cloneset-controller .
        name: golang-compile
        templateRef:
          name: golang-compile
          template: golang-compile-template
sarabala1979 commented 1 year ago

@Sad-polar-bear can you try v3.4.4 controller?

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

isubasinghe commented 1 year ago

@Sad-polar-bear could you please provide gitlab-checkout-template and golang-compile-template? It is not possible to run this workflow without this.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.

tooptoop4 commented 2 days ago

can close, https://github.com/argoproj/argo-workflows/blob/v3.3.10/workflow/controller/operator.go#L1964-L1965 code is gone in new vers