Open bh-tt opened 8 months ago
I could take a look at this some time over the next few days.
@bh-tt
To clarify, the subsequent run artifacts are being garbage collected, right? Is the problem only that the finalizer is getting stuck for subsequent runs?
Can you please provide kubectl describe
(redacted if necessary) of the first failure & one of the subsequent failures?
Sorry @Garett-MacGowan, somehow the github mails from your response got lost. At this point I no longer have a failing example to describe, but if we encounter this again we will add it to this issue.
'To clarify, the subsequent run artifacts are being garbage collected, right? Is the problem only that the finalizer is getting stuck for subsequent runs?'
I have not actually checked if the other artifacts were still present, but given the number of stuck workflows I'd have expected our S3 bucket to be full if that was the case. The problem seems to be that the finalizer is stuck for subsequent runs.
Pre-requisites
:latest
What happened/what did you expect to happen?
A cronworkflow running every 15 minutes had a single workflow that failed to delete its artifacts about 1 week ago. Since then, all other workflows made by the same cronworkflow are still present, despite those having different artifact keys (set as workflow UID/workflow-name) and the argowf controller attempting to delete them. The other workflows are being deleted (they have a metadata.deletionTimestamp) but their finalizer is not removed.
We are setting the
spec.artifactGC.forceFinalizerRemoval: true
setting.I expect only the workflow that could not delete its artifacts to remain, not all future workflows made by the same cronworkflow.
Version
v3.5.1
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container
Sorry, that has been deleted a while ago.