Open nilsalex opened 1 year ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
Pre-requisites
:latest
What happened/what you expected to happen?
I run workflows using a very unstable MinIO as artifact repository. Under load, S3 requests against this MinIO are prone to lose their connections. This leads to some retries in the init/wait containers.
I noticed that for directory artifacts, these retries always produce a non-transient error:
The root cause seems to me that after the first attempt, the directory
myArtifact.tmp
has already been created. During the retry, the executor first tries to download the artifact as file:https://github.com/argoproj/argo-workflows/blob/0adba4b3db288e9222814055937588ad0c601d85/workflow/artifacts/s3/s3.go#L85-L94
Which fails in the minio client:
https://github.com/minio/minio-go/blob/0be3a44757352b6e617ef00eb47829bce29baab1/api-get-object-file.go#L51-L57
Version
v3.4.6
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
This workflow would suffer from the issue if it came to a retry:
In order to simulate what happens when after a first unsuccessful download the directory
my_artifact.tmp
has already been created, I changed the workflow as follows:Logs from the workflow controller
I cannot easily get logs, but I hope they are not really relevant for this problem.
Logs from in your workflow's wait container
This is the init container. You can see the transient error and then the non-transient error from the retry.