Open louisnow opened 1 year ago
@louisnow it looks like your AWS credential provider is rate-limiting to add AWS credentials on your pods. Can you check your provider log?
Thanks for the quick response @sarabala1979, I'll check and get back!
@sarabala1979 the authentication is done from the node instance role. The maximum session duration for that role is set to 1 hr, could this be a potential reason? As when the workflow creates a pod during this process it could be breaking while updating the AWS token for the pod due to timeout. (the workflow usually runs longer than an hour)
@sarabala1979 can you look at the above comment and help me understand if this is the valid reason
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If this is a mentoring request, please provide an update here. Thank you for your contributions.
Pre-requisites
:latest
What happened/what you expected to happen?
Expecting artifacts to load consistently in the pod. However we notice at least 1 failure a day randomly.
Version
v3.3.9
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Any workflow pod that loads data from a store, s3 in our case.
It could be the internal library that's causing issues as well https://github.com/aws/aws-sdk-go/issues/2914 https://github.com/aws-observability/aws-otel-collector/issues/1286
Logs from the workflow controller
Logs from in your workflow's wait container