coreos / coreos-assembler

Tooling container to assemble CoreOS-like systems
https://coreos.github.io/coreos-assembler/
Apache License 2.0
335 stars 165 forks source link

Retry on `EndpointConnectionError` exceptions and make uploading retry for longer #3719

Closed jlebon closed 7 months ago

jlebon commented 7 months ago

cmdlib.py: retry on EndpointConnectionError exceptions

We hit this recently in the pipeline due to a flake in DNS resolution.


buildupload: bump retry period to 5 minutes

It's incredibly expensive when we flake on something at the very end of the pipeline when uploading S3 artifacts; all the created artifacts are lost and we have to rerun a whole new build.

We currently only retry for 10 seconds, which makes sense for truly transient flakes but for uploads, given the stakes, let's be more resilient to flakes that could take a bit longer to resolve as well, like DNS resolution issues.

Retry for 5 minutes, with an exponential backoff of up to 20 seconds.

jlebon commented 7 months ago

Appeased flake8.