argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

fix: don't log non-errors as "Non-transient error: <nil>". Fixes #13881 #13917

Open MasonM opened 2 days ago

MasonM commented 2 days ago

Fixes #13881

Motivation

There's many places that can call IsTransientErr() with nil, e.g. https://github.com/argoproj/argo-workflows/blob/2fd54884844bb76d760466027afa023c5bfd6b64/util/util.go#L39

This caused IsTransientErr() to log Non-transient error: <nil>. We shouldn't be generating logs in this case.

Modifications

Don't log when the error is nil

Verification

This can be reproduced locally by running make start PROFILE=mysql. Before this change:

$ head -n9 logs/controller.log
time="2024-11-19T05:55:16Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true                                                                                                                                                                                          
time="2024-11-19T05:55:16Z" level=info msg="cron config" cronSyncPeriod=10s                                                                                                                                                                                                        
time="2024-11-19T05:55:16Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s                                                                                                   
time="2024-11-19T05:55:16.010Z" level=info msg="not enabling pprof debug endpoints"
time="2024-11-19T05:55:16.014Z" level=debug msg="Get configmaps 200" 
time="2024-11-19T05:55:16.017Z" level=info msg="Configuration:\nartifactRepository:\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: my-minio-cred\n    bucket: my-bucket\n    endpoint: minio:9000\n    insecure: true\n    secretKeySecret:\n      key: secretkey\n      name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n  name: Workflow Completed\n  type: label\nexecutor:\n  imagePullPolicy: IfNotPresent\n  name: \"\"\n  resources:\n    limits:\n      cpu: 500m\n      memory: 256Mi\n    requests:\n      cpu: 100m\n      memory: 64Mi\nimages:\n  docker/whalesay:latest:\n    cmd:\n    - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n  scope: workflow\n  url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n  scope: pod\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n  scope: pod-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n  scope: event-source-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n  scope: sensor-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n  scope: workflow-list\n  url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n  enabled: true\n  path: /metrics\n  port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n  archive: true\n  archiveTTL: 168h0m0s\n  connectionPool:\n    maxIdleConns: 100\n  mysql:\n    database: argo\n    host: mysql\n    passwordSecret:\n      key: password\n      name: argo-mysql-config\n    port: 3306\n    tableName: argo_workflows\n    userNameSecret:\n      key: username\n      name: argo-mysql-config\n  nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n  completed: 10\n  errored: 2\n  failed: 2\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\n  sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n  metadata:\n    creationTimestamp: null\n  spec:\n    activeDeadlineSeconds: 300\n    arguments: {}\n    podSpecPatch: |\n      terminationGracePeriodSeconds: 3\n  status:\n    finishedAt: null\n    startedAt: null\nworkflowEvents: {}\n"                             
time="2024-11-19T05:55:16.017Z" level=info msg="Persistence configuration enabled"                                                                                                                                                                                                 
time="2024-11-19T05:55:16.019Z" level=debug msg="Get secrets 200"                                                                                                                                                                                                                  
time="2024-11-19T05:55:16.019Z" level=warning msg="Non-transient error: <nil>"                                                                                                                                                                                                     

After:

$ head -n9 logs/controller.log                      
time="2024-11-19T05:55:57Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true
time="2024-11-19T05:55:57Z" level=info msg="cron config" cronSyncPeriod=10s
time="2024-11-19T05:55:57Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s
time="2024-11-19T05:55:57.231Z" level=info msg="not enabling pprof debug endpoints"
time="2024-11-19T05:55:57.235Z" level=debug msg="Get configmaps 200"
time="2024-11-19T05:55:57.239Z" level=info msg="Configuration:\nartifactRepository:\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: my-minio-cred\n    bucket: my-bucket\n    endpoint: minio:9000\n    insecure: true\n    secretKeySecret:\n      key: secretkey\n      name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n  name: Workflow Completed\n  type: label\nexecutor:\n  imagePullPolicy: IfNotPresent\n  name: \"\"\n  resources:\n    limits:\n      cpu: 500m\n      memory: 256Mi\n    requests:\n      cpu: 100m\n      memory: 64Mi\nimages:\n  docker/whalesay:latest:\n    cmd:\n    - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n  scope: workflow\n  url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n  scope: pod\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n  scope: pod-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n  scope: event-source-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n  scope: sensor-logs\n  url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n  scope: workflow-list\n  url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n  enabled: true\n  path: /metrics\n  port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n  archive: true\n  archiveTTL: 168h0m0s\n  connectionPool:\n    maxIdleConns: 100\n  mysql:\n    database: argo\n    host: mysql\n    passwordSecret:\n      key: password\n      name: argo-mysql-config\n    port: 3306\n    tableName: argo_workflows\n    userNameSecret:\n      key: username\n      name: argo-mysql-config\n  nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n  completed: 10\n  errored: 2\n  failed: 2\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\n  sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n  metadata:\n    creationTimestamp: null\n  spec:\n    activeDeadlineSeconds: 300\n    arguments: {}\n    podSpecPatch: |\n      terminationGracePeriodSeconds: 3\n  status:\n    finishedAt: null\n    startedAt: null\nworkflowEvents: {}\n"
time="2024-11-19T05:55:57.239Z" level=info msg="Persistence configuration enabled"
time="2024-11-19T05:55:57.241Z" level=debug msg="Get secrets 200"
time="2024-11-19T05:55:57.243Z" level=debug msg="Get secrets 200"