Closed phlax closed 7 months ago
Hi @phlax we have identified an issue in our ingress LB setup which in combination with autoscaling causes these errors. We have now taken the first steps in addressing these issues, so they should be alleviated. We will work on the proper fix in the upcoming days. Thanks for reporting. I shall leave this issue open for the time being.
@milosgajdos thanks for picking this up - unfortunately we are still seeing a lot of these errors (not sure if more, but doesnt seem like less)
I appreciate that, we'll be making more changes going forward. It seems this is a more complex problem than what we've originally estimated 😞
this issue appears to have gotten a lot worse - it seems to have gone from a trickle to a flood
on our side we exploring various ways we can limit how much we pull from dockerhub - there is quite a bit of work done to those ends, but none of the approaches are trivial, and im not sure it would resolve - more just mitigate
There is an active incident @phlax https://status.docker.com/
This seems to be happening again; we're seeing many instances of this starting yesterday morning (US East) and continuing into today. There doesn't seem to be anything on the status dashboard though.
@mathstuf apologies for the interruptions caused yesterday -- we've been performing internal infrastructure updates and not everything has gone as expected. The upgrade has now been completed and things should be back to normal.
Not working for aws us-east-1 and us-east-2
Please fix it.
@ericlee42 We reported an incident over the weekend. I believe you likely ran into that: https://www.dockerstatus.com/pages/incident/533c6539221ae15e3f000031/654faa887895d304cc474aed
Thanks for reporting! We believe the original issue has been resolved. The incident over the weekend has been resolved as well. If you have future Hub connectivity issues, please file a new issue.
Problem description
We pull images fairly frequently from dockerhub as part of our CI (https://github.com/envoyproxy/envoy)
We have been seeing EOF errors pulling the images with increasing frequency
An example failure is here - but this seems to be happening ~daily on different images in different situations
the issues are transient and dont seem to happen more than once for any given incident
Debug Information
Browser name and version:
URL:
Some recent examples
Timetamp or time range:
Hub Username:
envoyproxy
Error messages (on screen or in browser console)
For example:
Task List