Open julienteisseire opened 3 weeks ago
I tried with 6 HTTP calls, it is now 2min11s workflow duration !
Duration: 2 minutes 11 seconds
Progress: 6/6
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ rinex-ingester-68whf main
├───✔ rinex-probe-1 http-get
├───✔ rinex-start-1 http-put
├───✔ rinex-probe-2 http-get
├───✔ rinex-start-2 http-put
├───✔ rinex-probe-3 http-get
└───✔ rinex-start-3 http-put
Despite the time to create the agent at the begining :
rinex-ingester-z2mvm-1340600742-agent 1/1 Running
And the fact that it is well the same agent which is used during all the workflow execution.
I see this logs :
time="2024-10-31T13:35:06.019Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:35:16.039Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:35:26.069Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:35:36.088Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:35:46.117Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:35:56.133Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:36:06.157Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:36:16.175Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:36:26.201Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:36:36.223Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:36:46.251Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:36:56.272Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
time="2024-10-31T13:37:06.306Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
This error message refers to the Argo Workflows usage of the word node (meaning a node in the DAG) rather than the kubernetes usage of the word node.
Apart from the error message in the logs is something not working? I know from your other issues you think workflows is being too slow for your workflow, but I'm trying more about this specific github issue.
Thank you for your answer.
I don't understand what is wrong in my workflow description. Why this message about node not obtained ?
It is a basic multi steps workflow (not a DAG) with 6 HTTP calls as follows :
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: rinex-ingester-
namespace: commanding
labels:
workflows.argoproj.io/test: "false"
annotations:
workflows.argoproj.io/description: |
First SWO workflow using HTTP template
workflows.argoproj.io/version: '>= 3.2.0'
spec:
entrypoint: main
serviceAccountName: argo-workflow
templates:
- name: main
steps:
- - name: rinex-probe-1
template: http-get
arguments:
parameters: [{name: url, value: "http://xxxxxxxxxxxxxxxxx/rinex/ingester/probe"}]
- - name: rinex-start-1
template: http-put
arguments:
parameters:
- name: url
value: "http://xxxxxxxxxxxxxxxxx/rinex/ingester/start"
- name: request
value: |
{
"startRequeststring": "Start Request"
}
- - name: rinex-probe-2
template: http-get
arguments:
parameters: [{name: url, value: "http://xxxxxxxxxxxxxxxxx/rinex/ingester/probe"}]
- - name: rinex-start-2
template: http-put
arguments:
parameters:
- name: url
value: "http://xxxxxxxxxxxxxxxxx/rinex/ingester/start"
- name: request
value: |
{
"startRequeststring": "Start Request"
}
- - name: rinex-probe-3
template: http-get
arguments:
parameters: [{name: url, value: "http://xxxxxxxxxxxxxxxxx/rinex/ingester/probe"}]
- - name: rinex-start-3
template: http-put
arguments:
parameters:
- name: url
value: "http://xxxxxxxxxxxxxxxxx/rinex/ingester/start"
- name: request
value: |
{
"startRequeststring": "Start Request"
}
- name: http-get
inputs:
parameters:
- name: url
http:
method: "GET"
url: "{{inputs.parameters.url}}"
- name: http-put
inputs:
parameters:
- name: url
- name: request
http:
method: "PUT"
url: '{{inputs.parameters.url}}'
body: '{{inputs.parameters.request}}'
So I don't see what node
is about in my workflow and how I can fix this error ... ?
Indeed, I am also trying to understand why it took 2m11s to start agent POD and then run 6 HTTP calls ...
Do you consider it is a normal execution time ? I am looking for information regarding why I have so many 10s waiting phases
time="2024-10-31T14:16:06.829Z" level=debug msg="Syncing all CronWorkflows"
time="2024-10-31T14:16:16.830Z" level=debug msg="Syncing all CronWorkflows"
or all other sleep
in the workflow.
So I have really 2 things to deal with :
1. error about node not obtained
time="2024-10-31T13:37:06.306Z" level=error msg="was unable to obtain node for rinex-ingester-68whf-2166136261" namespace=commanding workflow=rinex-ingester-68whf
2. workflow submit duration
Name: rinex-ingester-68whf
Namespace: commanding
ServiceAccount: argo-workflow
Status: Succeeded
Conditions:
PodRunning False
Completed True
Created: Thu Oct 31 14:34:55 +0100 (2 minutes ago)
Started: Thu Oct 31 14:34:55 +0100 (2 minutes ago)
Finished: Thu Oct 31 14:37:06 +0100 (now)
Duration: 2 minutes 11 seconds
Progress: 6/6
STEP TEMPLATE PODNAME DURATION MESSAGE
✔ rinex-ingester-68whf main
├───✔ rinex-probe-1 http-get
├───✔ rinex-start-1 http-put
├───✔ rinex-probe-2 http-get
├───✔ rinex-start-2 http-put
├───✔ rinex-probe-3 http-get
└───✔ rinex-start-3 http-put
If you may help. Thank you
Duplicated with #12726.
I think what you're concerned about is not this error log, but rather the excessive execution time of the workflow, which you can optimize it by adjusting the controller envs: ARGO_AGENT_PATCH_RATE
or DEFAULT_REQUEUE_TIME
.
Indeed performance was my main concern. Thank you.
But I also consider having an error is not normal and it would be great to understand why this error and how to fix it.
Error message : "level=error msg="was unable to obtain node for ..." Workflow : 6 HTTP calls in 6 consecutive steps. Kind with 1 node in local computer.
Thank you.
i think https://argo-workflows.readthedocs.io/en/latest/http-template/ should be deprecated/removed, having many tiny steps/pods is inefficient. if u just use a simple workflow with single pod like https://github.com/argoproj/argo-workflows/blob/main/examples/retry-container.yaml#L16 and put multiple curl/python requests calls in there it will work quick
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
Hi,
I submit my workflow with few HTTP calls.
I have the following error level=error msg="was unable to obtain node for.
Regarding workflow-controller logs, I see the following logs :
I guess it causes lag and overtime consuming (50 seconds workflow duration for 2 HTTP calls).
Even if I see correct output, I would like to understand why this node error while using only one node in KinD.
I can understand agent POD is not existing and has to be created first, but why so many
was unable to obtain node
logs ? And why not to maintain the agent alive in order to avoid creating it at workflow submit ?Thank you
Version(s)
v3.5.12
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container