argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

artifactGC ignores archived logs when no artifacts are used #13784

Closed static-moonlight closed 1 month ago

static-moonlight commented 1 month ago

Pre-requisites

What happened? What did you expect to happen?

I have workflows, which don't use artifacts, but in case of errors, I'd like to have access to the logs (obviously). Also, to be more resources efficient I usually use podGC to delete the pod as soon an possible and I also enabled log archiving, so that I have so logs persisted. So far so good.

The problem is, that once the workflow is automatically deleted (because of the ttl setting), the logs remain in the artifact repository ... forever. Since the log files are technically artifacts (are they not?), because the workflow controller puts them in the artifact repository, I would expect that the artifactGC will clean them up as well, even if there are no other artifacts being used in that workflow. I'd like to add that once I have at least 1 artifact in the workflow, everything (artifacts AND logs) are being cleaned up as expected.

TLDR: workflow without using artifacts + artifactGC enabled + podGC enabled + archiveLogs enabled = lots of orphaned logs in the artifact repository

Version(s)

v3.5.11

Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: archive-logs-no-artifacts
  namespace: test
spec:
  entrypoint: entrypoint
  serviceAccountName: test
  artifactGC:
    serviceAccountName: test
    strategy: OnWorkflowDeletion
    forceFinalizerRemoval: true
  podGC:
    strategy: OnPodCompletion
  archiveLogs: true
  activeDeadlineSeconds: 30
  ttlStrategy:
    secondsAfterSuccess: 60
    secondsAfterFailure: 60
  templates:
    - name: entrypoint
      dag:
        tasks:
          - name: print-text
            template: print-text
            arguments:
              parameters:
                - name: MESSAGE
                  value: I'll be back
    - name: print-text
      inputs:
        parameters:
          - name: MESSAGE
      script:
        image: docker.io/busybox:1.36
        command: [sh]
        source: echo "{{inputs.parameters.MESSAGE}}"

Logs from the workflow controller

time="2024-10-18T09:13:42.666Z" level=info msg="Processing workflow" Phase= ResourceVersion=459784113 namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=info msg="Task-result reconciliation" namespace=test numObjs=0 workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=info msg="Updated phase  -> Running" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=info msg="was unable to obtain node for , letting display name to be nodeName" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=info msg="DAG node archive-logs-no-artifacts-l8s42 initialized Running" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=warning msg="was unable to obtain the node for archive-logs-no-artifacts-l8s42-772283308, taskName print-text"
time="2024-10-18T09:13:42.682Z" level=warning msg="was unable to obtain the node for archive-logs-no-artifacts-l8s42-772283308, taskName print-text"
time="2024-10-18T09:13:42.682Z" level=info msg="All of node archive-logs-no-artifacts-l8s42.print-text dependencies [] completed" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.682Z" level=warning msg="Node was nil, will be initialized as type Skipped" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.683Z" level=info msg="Pod node archive-logs-no-artifacts-l8s42-772283308 initialized Pending" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.693Z" level=info msg="Created pod: archive-logs-no-artifacts-l8s42.print-text (archive-logs-no-artifacts-l8s42-print-text-772283308)" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.694Z" level=info msg="TaskSet Reconciliation" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.694Z" level=info msg=reconcileAgentPod namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:42.701Z" level=info msg="Workflow update successful" namespace=test phase=Running resourceVersion=459784124 workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.694Z" level=info msg="Processing workflow" Phase=Running ResourceVersion=459784124 namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="Task-result reconciliation" namespace=test numObjs=1 workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="task-result changed" namespace=test nodeID=archive-logs-no-artifacts-l8s42-772283308 workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="node changed" namespace=test new.message= new.phase=Succeeded new.progress=0/1 nodeID=archive-logs-no-artifacts-l8s42-772283308 old.message= old.phase=Pending old.progress=0/1 workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="Outbound nodes of archive-logs-no-artifacts-l8s42 set to [archive-logs-no-artifacts-l8s42-772283308]" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="node archive-logs-no-artifacts-l8s42 phase Running -> Succeeded" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="node archive-logs-no-artifacts-l8s42 finished: 2024-10-18 09:13:48.695517254 +0000 UTC" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="TaskSet Reconciliation" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg=reconcileAgentPod namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="Updated phase Running -> Succeeded" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.695Z" level=info msg="Marking workflow completed" namespace=test workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.703Z" level=info msg="Workflow update successful" namespace=test phase=Succeeded resourceVersion=459784407 workflow=archive-logs-no-artifacts-l8s42
time="2024-10-18T09:13:48.704Z" level=info msg="Queueing Succeeded workflow test/archive-logs-no-artifacts-l8s42 for delete in 1m0s due to TTL"
time="2024-10-18T09:13:53.712Z" level=info msg="cleaning up pod" action=deletePod key=test/archive-logs-no-artifacts-l8s42-print-text-772283308/deletePod
time="2024-10-18T09:14:49.000Z" level=info msg="Deleting garbage collected workflow 'test/archive-logs-no-artifacts-l8s42'"
time="2024-10-18T09:14:49.006Z" level=info msg="Successfully request 'test/archive-logs-no-artifacts-l8s42' to be deleted"

Logs from in your workflow's wait container

No resources found in [...] namespace.
agilgur5 commented 1 month ago

Duplicate of #13421 / #13338