Open MasonM opened 2 days ago
Fixes #13881
There's many places that can call IsTransientErr() with nil, e.g. https://github.com/argoproj/argo-workflows/blob/2fd54884844bb76d760466027afa023c5bfd6b64/util/util.go#L39
IsTransientErr()
This caused IsTransientErr() to log Non-transient error: <nil>. We shouldn't be generating logs in this case.
Non-transient error: <nil>
Don't log when the error is nil
This can be reproduced locally by running make start PROFILE=mysql. Before this change:
make start PROFILE=mysql
$ head -n9 logs/controller.log time="2024-11-19T05:55:16Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true time="2024-11-19T05:55:16Z" level=info msg="cron config" cronSyncPeriod=10s time="2024-11-19T05:55:16Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s time="2024-11-19T05:55:16.010Z" level=info msg="not enabling pprof debug endpoints" time="2024-11-19T05:55:16.014Z" level=debug msg="Get configmaps 200" time="2024-11-19T05:55:16.017Z" level=info msg="Configuration:\nartifactRepository:\n s3:\n accessKeySecret:\n key: accesskey\n name: my-minio-cred\n bucket: my-bucket\n endpoint: minio:9000\n insecure: true\n secretKeySecret:\n key: secretkey\n name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n name: Workflow Completed\n type: label\nexecutor:\n imagePullPolicy: IfNotPresent\n name: \"\"\n resources:\n limits:\n cpu: 500m\n memory: 256Mi\n requests:\n cpu: 100m\n memory: 64Mi\nimages:\n docker/whalesay:latest:\n cmd:\n - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n scope: workflow\n url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n scope: pod\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n scope: pod-logs\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n scope: event-source-logs\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n scope: sensor-logs\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n scope: workflow-list\n url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n enabled: true\n path: /metrics\n port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n archive: true\n archiveTTL: 168h0m0s\n connectionPool:\n maxIdleConns: 100\n mysql:\n database: argo\n host: mysql\n passwordSecret:\n key: password\n name: argo-mysql-config\n port: 3306\n tableName: argo_workflows\n userNameSecret:\n key: username\n name: argo-mysql-config\n nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n completed: 10\n errored: 2\n failed: 2\nsso:\n clientId:\n key: \"\"\n clientSecret:\n key: \"\"\n issuer: \"\"\n redirectUrl: \"\"\n sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n metadata:\n creationTimestamp: null\n spec:\n activeDeadlineSeconds: 300\n arguments: {}\n podSpecPatch: |\n terminationGracePeriodSeconds: 3\n status:\n finishedAt: null\n startedAt: null\nworkflowEvents: {}\n" time="2024-11-19T05:55:16.017Z" level=info msg="Persistence configuration enabled" time="2024-11-19T05:55:16.019Z" level=debug msg="Get secrets 200" time="2024-11-19T05:55:16.019Z" level=warning msg="Non-transient error: <nil>"
After:
$ head -n9 logs/controller.log time="2024-11-19T05:55:57Z" level=info msg="index config" indexWorkflowSemaphoreKeys=true time="2024-11-19T05:55:57Z" level=info msg="cron config" cronSyncPeriod=10s time="2024-11-19T05:55:57Z" level=info msg="Memoization caches will be garbage-collected if they have not been hit after" gcAfterNotHitDuration=30s time="2024-11-19T05:55:57.231Z" level=info msg="not enabling pprof debug endpoints" time="2024-11-19T05:55:57.235Z" level=debug msg="Get configmaps 200" time="2024-11-19T05:55:57.239Z" level=info msg="Configuration:\nartifactRepository:\n s3:\n accessKeySecret:\n key: accesskey\n name: my-minio-cred\n bucket: my-bucket\n endpoint: minio:9000\n insecure: true\n secretKeySecret:\n key: secretkey\n name: my-minio-cred\ncolumns:\n- key: workflows.argoproj.io/completed\n name: Workflow Completed\n type: label\nexecutor:\n imagePullPolicy: IfNotPresent\n name: \"\"\n resources:\n limits:\n cpu: 500m\n memory: 256Mi\n requests:\n cpu: 100m\n memory: 64Mi\nimages:\n docker/whalesay:latest:\n cmd:\n - cowsay\ninitialDelay: 0s\nlinks:\n- name: Workflow Link\n scope: workflow\n url: http://logging-facility?namespace=${metadata.namespace}&workflowName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Link\n scope: pod\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Pod Logs Link\n scope: pod-logs\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Event Source Logs Link\n scope: event-source-logs\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Sensor Logs Link\n scope: sensor-logs\n url: http://logging-facility?namespace=${metadata.namespace}&podName=${metadata.name}&startedAt=${status.startedAt}&finishedAt=${status.finishedAt}\n- name: Completed Workflows\n scope: workflow-list\n url: http://workflows?label=workflows.argoproj.io/completed=true\nmetricsConfig:\n enabled: true\n path: /metrics\n port: 9090\nnamespaceParallelism: 10\nnodeEvents: {}\npersistence:\n archive: true\n archiveTTL: 168h0m0s\n connectionPool:\n maxIdleConns: 100\n mysql:\n database: argo\n host: mysql\n passwordSecret:\n key: password\n name: argo-mysql-config\n port: 3306\n tableName: argo_workflows\n userNameSecret:\n key: username\n name: argo-mysql-config\n nodeStatusOffLoad: true\npodSpecLogStrategy: {}\nretentionPolicy:\n completed: 10\n errored: 2\n failed: 2\nsso:\n clientId:\n key: \"\"\n clientSecret:\n key: \"\"\n issuer: \"\"\n redirectUrl: \"\"\n sessionExpiry: 0s\ntelemetryConfig: {}\nworkflowDefaults:\n metadata:\n creationTimestamp: null\n spec:\n activeDeadlineSeconds: 300\n arguments: {}\n podSpecPatch: |\n terminationGracePeriodSeconds: 3\n status:\n finishedAt: null\n startedAt: null\nworkflowEvents: {}\n" time="2024-11-19T05:55:57.239Z" level=info msg="Persistence configuration enabled" time="2024-11-19T05:55:57.241Z" level=debug msg="Get secrets 200" time="2024-11-19T05:55:57.243Z" level=debug msg="Get secrets 200"
Fixes #13881
Motivation
There's many places that can call
IsTransientErr()
with nil, e.g. https://github.com/argoproj/argo-workflows/blob/2fd54884844bb76d760466027afa023c5bfd6b64/util/util.go#L39This caused
IsTransientErr()
to logNon-transient error: <nil>
. We shouldn't be generating logs in this case.Modifications
Don't log when the error is nil
Verification
This can be reproduced locally by running
make start PROFILE=mysql
. Before this change:After: