argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.08k stars 3.2k forks source link

`labelsFrom` not setting labels half the time #12167

Open tooptoop4 opened 1 year ago

tooptoop4 commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

for a common sensor, i had 321 workflows generated. labels were set for 135 but not set for 186 of them. the input message for the workflows are same except for different numbers (the messages are same length, so imagine difference between the messages being a value inside of 4627827 in one vs 1953165 in another)

the sensor uses a snippet like this:

workflowMetadata:
  labelsFrom:
    redactkey1:
      expression: "{{= sprig.quote(sprig.regexReplaceAll('^[-_.]*',sprig.regexReplaceAll('[-_.]*$',sprig.substr(0,63,sprig.regexReplaceAll('[^-A-Za-z0-9_.]',sprig.regexReplaceAll('/([^/]*)$',sprig.regexFind('redact/([^,]+),', workflow.parameters.message),''),'')),''),'')) }}"
    redactkey2:
      expression: "{{= sprig.quote(sprig.regexReplaceAll('^[-_.]*',sprig.regexReplaceAll('[-_.]*$',sprig.substr(0,63,sprig.regexReplaceAll('[^-A-Za-z0-9_.]',sprig.regexReplaceAll('(.+)/',sprig.regexFind('redact/([^,]+),', workflow.parameters.message),''),'')),''),'')) }}"

a command like below showed "" for the redactkey1 label on more lines than actual value:

kubectl get workflow -A --template '{{range .items}}{{.metadata.name}} {{.metadata.namespace}} {{.metadata.creationTimestamp}} {{index .metadata.labels "workflows.argoproj.io/phase"}} {{index .metadata.labels "redactkey1"}} {{"\n"}}{{end}}'

similarly the argo_archived_workflows db table and top of the 'archived workflows' UI didn't show redactkey1 label for more than half of these workflows

but interestingly the

  workflowMetadata:
    labelsFrom:

section in the manifest yaml of the 'archived workflows' UI consistently shows the labels with expected value (verified all the workflows have it!)

upon comparing the full manifest yaml side by side for one wf that had label showing in (argo_archived_workflows db table and top of the 'archived workflows' UI) vs one that didn't have it showing, i also noticed the below about v2 format only shows on the wf that has redactkey1 label showing (note that i do not define this v2 stuff in my sensor)

  annotations:
    workflows.argoproj.io/pod-name-format: v2

i notice the mix of wfs with/without the label across failed/succeeded wfs, i wonder if the wf ever going into mutex/pending queue has any bearing on it? i notice https://github.com/argoproj/argo-workflows/issues/9417 but its closed

Version

3.4.11

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

n/a

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

n/a

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

n/a
tooptoop4 commented 6 days ago

cause is similar to https://github.com/argoproj/argo-workflows/issues/10178

labels set if not waiting for mutex/semaphore: https://github.com/argoproj/argo-workflows/blob/v3.6.0-rc4/workflow/controller/operator.go#L3963-L3964

whereas if the wf had to wait to acquire it skips that: https://github.com/argoproj/argo-workflows/blob/v3.6.0-rc4/workflow/controller/operator.go#L264-L265