Open cdtzabra opened 2 months ago
And looking at the previous logs, we can clearly see that the pod has
panic: runtime error: invalid memory address or nil pointer dereference
because of thecontainer
parameter missing from the request.
There should be a stack trace following that log pointing to the exact line of the nil
pointer
Otherwise it would be very impactful in production if simple bad API calls crash the whole argo-worflows [sic]
It normally doesn't; I'm suspecting there's a missing recover
statement somewhere
There should be a stack trace following that log pointing to the exact line of the
nil
pointer
there was no more log concerning the error unfortunately.
Can you run https://pkg.go.dev/cmd/addr2line with the program counter as an argument along with the binary?
Hello, can I work on this one?
@cdtzabra can u run with debug log level turned on? i also raized https://github.com/argoproj/argo-workflows/pull/13866 if u wanna give that a whirl
@cdtzabra can u run with debug log level turned on? i also raized #13866 if u wanna give that a whirl
Hi,
Yes, I can. Next week
@tooptoop4
I've just tested . I first tested in lab and I could not reproduce the problem despite several times. I removed the resource requests/limits that had been added and still no crashloopback.
So I switched to the pre-production argo-workflows (where I'd encountered the problem), I tested our real workflowTemplate and there was the CrashLoopBackOff.
By re-running the tests with the basic worklowtemplates tested in LAB, there's no problem.
I put the preprod PODS in debug mode, I redid the crash test but there's no more info in the log.
Note for the basic templates, the msg="a container name must be specified for pod workflow xxx
is still present in server log BUT there is no crash
spec:
containers:
- args:
- server
- --configmap=argo-workflows-workflow-controller-configmap
- --auth-mode=client
- --auth-mode=sso
- --secure=false
- --namespaced
- --loglevel
- debug
- --gloglevel
- "0"
- --log-format
- text
workflow server pod1 log
time="2024-11-18T10:16:33.744Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GetWorkflowTemplate grpc.service=workflowtemplate.WorkflowTemplateService grpc.start_time="2024-11-18T10:16:33Z" grpc.time_ms=13.259 span.kind=server system=grpc
time="2024-11-18T10:16:33.747Z" level=info duration=17.33487ms method=GET path=/api/v1/workflow-templates/argo-workflows/workflow-controlm size=6690 status=0
time="2024-11-18T10:16:48.263Z" level=info duration="112.732µs" method=GET path=index.html size=487 status=0
time="2024-11-18T10:17:08.263Z" level=info duration="181.525µs" method=GET path=index.html size=487 status=0
time="2024-11-18T10:17:28.263Z" level=info duration="146.051µs" method=GET path=index.html size=487 status=0
time="2024-11-18T10:17:48.264Z" level=info duration="139.866µs" method=GET path=index.html size=487 status=0
time="2024-11-18T10:18:08.264Z" level=info duration="347.373µs" method=GET path=index.html size=487 status=0
time="2024-11-18T10:18:28.263Z" level=info duration="315.198µs" method=GET path=index.html size=487 status=0
time="2024-11-18T10:18:41.558Z" level=debug msg="List options" namespace=argo-workflows options="{{ } workflows.argoproj.io/workflow=workflow-controlm-44vpo false false <nil> 0 }" workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.558Z" level=debug msg="Log options" namespace=argo-workflows options="&PodLogOptions{Container:,Follow:false,Previous:false,SinceSeconds:nil,SinceTime:<nil>,Timestamps:false,TailLines:nil,LimitBytes:nil,InsecureSkipTLSVerifyBackend:false,}" workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.560Z" level=debug msg="Listing workflow pods" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Ensuring pod logs stream" alreadyStreaming=false namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 podPhase=Succeeded workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Ensuring pod logs stream" alreadyStreaming=false namespace=argo-workflows podName=workflow-controlm-44vpo-create-job-3142162940 podPhase=Succeeded workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Ensuring pod logs stream" alreadyStreaming=false namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 podPhase=Succeeded workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Not starting watches" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Waiting for work-group" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Sorting entries" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Streaming pod logs" namespace=argo-workflows podName=workflow-controlm-44vpo-create-job-3142162940 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Streaming pod logs" namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.570Z" level=debug msg="Streaming pod logs" namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.575Z" level=error msg="a container name must be specified for pod workflow-controlm-44vpo-notify-204188948, choose one of: [init wait main]" namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.575Z" level=debug msg="Pod logs stream done" namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.576Z" level=error msg="a container name must be specified for pod workflow-controlm-44vpo-build-1554548131, choose one of: [init wait prerequisistes main]" namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.576Z" level=debug msg="Pod logs stream done" namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.594Z" level=debug msg="Pod logs stream done" namespace=argo-workflows podName=workflow-controlm-44vpo-create-job-3142162940 workflow=workflow-controlm-44vpo
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x1e8c8a4]
goroutine 373 [running]:
github.com/argoproj/argo-workflows/v3/util/logs.WorkflowLogs.func1.1({0xc0007879e0, 0x2d})
/go/src/github.com/argoproj/argo-workflows/util/logs/workflow-logger.go:167 +0x524
created by github.com/argoproj/argo-workflows/v3/util/logs.WorkflowLogs.func1 in goroutine 224
/go/src/github.com/argoproj/argo-workflows/util/logs/workflow-logger.go:126 +0x5e5
workflow server pod2 log
time="2024-11-18T10:18:41.650Z" level=debug msg="Not starting watches" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.650Z" level=debug msg="Waiting for work-group" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.650Z" level=debug msg="Sorting entries" namespace=argo-workflows workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.650Z" level=debug msg="Streaming pod logs" namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.650Z" level=debug msg="Streaming pod logs" namespace=argo-workflows podName=workflow-controlm-44vpo-create-job-3142162940 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.650Z" level=debug msg="Streaming pod logs" namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.653Z" level=error msg="a container name must be specified for pod workflow-controlm-44vpo-notify-204188948, choose one of: [init wait main]" namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.653Z" level=debug msg="Pod logs stream done" namespace=argo-workflows podName=workflow-controlm-44vpo-notify-204188948 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.654Z" level=error msg="a container name must be specified for pod workflow-controlm-44vpo-build-1554548131, choose one of: [init wait prerequisistes main]" namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.654Z" level=debug msg="Pod logs stream done" namespace=argo-workflows podName=workflow-controlm-44vpo-build-1554548131 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.669Z" level=debug msg="Pod logs stream done" namespace=argo-workflows podName=workflow-controlm-44vpo-create-job-3142162940 workflow=workflow-controlm-44vpo
time="2024-11-18T10:18:41.670Z" level=debug msg="Work-group done" namespace=argo-workflows workflow=workflow-controlm-44vpo
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x48 pc=0x1e8c8a4]
goroutine 287 [running]:
github.com/argoproj/argo-workflows/v3/util/logs.WorkflowLogs.func1.1({0xc00054a5d0, 0x2d})
/go/src/github.com/argoproj/argo-workflows/util/logs/workflow-logger.go:167 +0x524
created by github.com/argoproj/argo-workflows/v3/util/logs.WorkflowLogs.func1 in goroutine 294
/go/src/github.com/argoproj/argo-workflows/util/logs/workflow-logger.go:126 +0x5e5
So I wonder if the problem isn't related to the resource types in the template. The workflowTemplate that causes the workflow-server to crash also uses “artificat”, “resource” (to create a kubernetes job) and more
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: workflow-controlm
namespace: argo-workflows
spec:
ttlStrategy:
# keep completed workflows for 1d
secondsAfterCompletion: 86400
workflowMetadata:
annotations:
workflows.argoproj.io/title: "{{workflow.parameters.DISPLAY_NAME}}"
labels:
workflows.argoproj.io/title: "{{workflow.parameters.DISPLAY_NAME}}"
entrypoint: main-template
onExit: exit-handler
serviceAccountName: workflow-pods-sa
arguments:
parameters:
- name: PROJECT_URL
value: ""
- name: PROJECT_BRANCH
value: ""
- name: CONTROLM_JOB
value: ""
- name: CONTROLM_TASK
value: ""
- name: CONTROLM_CTMS
value: ""
- name: TEMPLATE_NAME
value: ""
- name: BUILD_URL
value: ""
- name: DISPLAY_NAME
value: "to-overload"
volumes:
- name: workspace
emptyDir: {}
templates:
- name: main-template
steps:
- - name: build
template: build
arguments:
parameters:
- name: PROJECT_URL
value: "{{workflow.parameters.PROJECT_URL}}"
- name: PROJECT_BRANCH
value: "{{workflow.parameters.PROJECT_BRANCH}}"
- name: CONTROLM_TASK
value: "{{workflow.parameters.CONTROLM_TASK}}"
- name: TEMPLATE_NAME
value: "{{workflow.parameters.TEMPLATE_NAME}}"
- - name: create-job
template: create-job
arguments:
artifacts:
- name: kubejob
raw:
data: |
{{workflow.outputs.parameters.KUBEJOB}}
# Steps template definition: Build
- name: build
inputs:
parameters:
- name: PROJECT_URL
- name: PROJECT_BRANCH
- name: CONTROLM_TASK
- name: TEMPLATE_NAME
outputs:
parameters:
- name: namespace
valueFrom:
path: /workspace/namespace.txt
globalName: NAMESPACE
- name: kube_job
valueFrom:
path: /workspace/kube-job.yaml
globalName: KUBEJOB
- name: backofflimit
valueFrom:
path: /workspace/backofflimit.txt
globalName: backoffLimit
containerSet:
volumeMounts:
- name: workspace
mountPath: /workspace
containers:
- name: prerequisistes
image: myregistry/tools:latest
envFrom:
- secretRef:
name: gitlab-a1963-read
command:
- sh
- '-c'
- |
set -o errexit
cd /workspace
gitUrl="{{inputs.parameters.PROJECT_URL}}"
gitBranch="{{inputs.parameters.PROJECT_BRANCH}}"
codeApp=$(basename $(dirname {{inputs.parameters.PROJECT_URL}}))
echo "${codeApp}-${gitBranch}" > namespace.txt
git clone -b ${gitBranch} ${gitUrl} chart
- name: main
dependencies:
- prerequisistes
image: myregistry/tools:latest
command:
- sh
- '-c'
- |
set -o errexit
cd /workspace/
printf "\n### Templating the Helm Chart ###\n"
taskPath="chart/{{inputs.parameters.CONTROLM_TASK}}"
templateName="{{inputs.parameters.TEMPLATE_NAME}}"
outputDir="./myfolder"
helm template ${taskPath}/${templateName}/ -f ${taskPath}/app-values.yaml -f ${taskPath}/infra-values.yaml --output-dir ${outputDir} --dependency-update
# Store data to export them as global variables
find ${outputDir} -type f -iname "job*.yaml" | xargs cat > kube-job.yaml
yq .spec.backoffLimit kube-job.yaml > backofflimit.txt
namespace=$(cat namespace.txt)
get_ns=$(kubectl get -o name --ignore-not-found ns ${namespace})
if [ -z $get_ns ]; then kubectl create ns ${namespace}; fi
printf "\n### Creating configmaps - Secrets - AVP ###\n"
find ${outputDir} -type f -iname "configmap-*.yaml" -o -iname "secret*.yaml" | while read -r line; do
cat ${line} | | kubectl -n ${namespace} apply -f -
done
# Steps template definition: Argo App
- name: create-job
inputs:
artifacts:
- name: kubejob
path: /artifacts/kubejob
resource:
action: create
successCondition: status.succeeded > 0
failureCondition: status.failed > {{workflow.outputs.parameters.backoffLimit}}
flags: ["--namespace={{workflow.outputs.parameters.NAMESPACE}}"]
manifest: |
{{workflow.outputs.parameters.KUBEJOB}}
outputs:
parameters:
- name: job-status
valueFrom:
jsonPath: '{.status}'
# Exit handler templates
# After the completion of the entrypoint template
- name: exit-handler
steps:
- - name: Success-Notifier
template: notify
arguments:
parameters:
- name: exitCode
value: 0
when: "{{workflow.status}} == Succeeded && {{workflow.parameters.PROJECT_BRANCH}} != dev"
- name: Failure-Notifier
template: notify
arguments:
parameters:
- name: exitCode
value: 1
when: "{{workflow.status}} != Succeeded && {{workflow.parameters.PROJECT_BRANCH}} != dev"
# notify template definition
- name: notify
inputs:
parameters:
- name: exitCode
container:
image: alpine/curl
envFrom:
- secretRef:
name: controlm
command:
- sh
- '-c'
- |
exitCode="{{inputs.parameters.exitCode}}"
job="{{workflow.parameters.CONTROLM_JOB}}"
task="{{workflow.parameters.CONTROLM_TASK}}"
ctms="{{workflow.parameters.CONTROLM_CTMS}}"
message="xxxx"
# POST exitCode HERE
@cdtzabra did u build the image locally with the https://github.com/argoproj/argo-workflows/pull/13866 fix in it?
@tooptoop4 NO, but I will do it...
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
By trying to get a workflow logs with the API endpoint
api/workflow/namespace/name/log
using curl, I received a reply502 Bad Gateway
. I checked the pod status and found that the pod servers areCrashLoopBackOff
followed by arestart
.And looking at the previous logs, we can clearly see that the pod has
panic: runtime error: invalid memory address or nil pointer dereference
because of thecontainer
parameter missing from the request.When I specify the container
?logOptions.container=main
it works fineHowever, I think that the
argo-workflows-server
pod should notCrashLoopBackOff
for this reason. Otherwise it would be very impactful in production if simple bad API calls crash the whole argo-worflowsVersion(s)
v3.5.10
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container