etix / mirrorbits

Mirrorbits is a geographical download redirector written in Go for distributing files efficiently across a set of mirrors.
MIT License
497 stars 90 forks source link

mirror not answering, but status still Up in mirrorbits #103

Open stormi opened 4 years ago

stormi commented 4 years ago

Hi,

I've had several users reporting failed downloads. I tracked it down to a specific mirror that mirrorbits still considers up though it can't contact it...

avril 15 14:21:35 xcpng-repo mirrorbits[36752]: 2020/04/15 14:21:35.206 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection
avril 15 14:36:05 xcpng-repo mirrorbits[36752]: 2020/04/15 14:36:05.194 CEST [mirror.buf.us.smartcrew.xyz] Requesting file list via rsync...
avril 15 14:36:25 xcpng-repo mirrorbits[36752]: 2020/04/15 14:36:25.192 CEST [mirror.buf.us.smartcrew.xyz] fetching trace file failed: Get http://mirror.buf.us.smartcrew.xyz/xcp-ng/trace: dial tcp 107.175.70.118:80: i/o timeout
avril 15 14:36:35 xcpng-repo mirrorbits[36752]: 2020/04/15 14:36:35.208 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection
avril 15 14:51:05 xcpng-repo mirrorbits[36752]: 2020/04/15 14:51:05.194 CEST [mirror.buf.us.smartcrew.xyz] Requesting file list via rsync...
avril 15 14:51:25 xcpng-repo mirrorbits[36752]: 2020/04/15 14:51:25.192 CEST [mirror.buf.us.smartcrew.xyz] fetching trace file failed: Get http://mirror.buf.us.smartcrew.xyz/xcp-ng/trace: dial tcp 107.175.70.118:80: i/o timeout
avril 15 14:51:35 xcpng-repo mirrorbits[36752]: 2020/04/15 14:51:35.208 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection
avril 15 15:06:05 xcpng-repo mirrorbits[36752]: 2020/04/15 15:06:05.194 CEST [mirror.buf.us.smartcrew.xyz] Requesting file list via rsync...
avril 15 15:06:25 xcpng-repo mirrorbits[36752]: 2020/04/15 15:06:25.192 CEST [mirror.buf.us.smartcrew.xyz] fetching trace file failed: Get http://mirror.buf.us.smartcrew.xyz/xcp-ng/trace: dial tcp 107.175.70.118:80: i/o timeout
avril 15 15:06:35 xcpng-repo mirrorbits[36752]: 2020/04/15 15:06:35.208 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection
avril 15 15:21:05 xcpng-repo mirrorbits[36752]: 2020/04/15 15:21:05.193 CEST [mirror.buf.us.smartcrew.xyz] Requesting file list via rsync...
avril 15 15:21:25 xcpng-repo mirrorbits[36752]: 2020/04/15 15:21:25.192 CEST [mirror.buf.us.smartcrew.xyz] fetching trace file failed: Get http://mirror.buf.us.smartcrew.xyz/xcp-ng/trace: dial tcp 107.175.70.118:80: i/o timeout
avril 15 15:21:35 xcpng-repo mirrorbits[36752]: 2020/04/15 15:21:35.208 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection
avril 15 15:36:05 xcpng-repo mirrorbits[36752]: 2020/04/15 15:36:05.193 CEST [mirror.buf.us.smartcrew.xyz] Requesting file list via rsync...
avril 15 15:36:25 xcpng-repo mirrorbits[36752]: 2020/04/15 15:36:25.192 CEST [mirror.buf.us.smartcrew.xyz] fetching trace file failed: Get http://mirror.buf.us.smartcrew.xyz/xcp-ng/trace: dial tcp 107.175.70.118:80: i/o timeout
avril 15 15:36:35 xcpng-repo mirrorbits[36752]: 2020/04/15 15:36:35.203 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection
avril 15 15:51:05 xcpng-repo mirrorbits[36752]: 2020/04/15 15:51:05.194 CEST [mirror.buf.us.smartcrew.xyz] Requesting file list via rsync...
avril 15 15:51:25 xcpng-repo mirrorbits[36752]: 2020/04/15 15:51:25.192 CEST [mirror.buf.us.smartcrew.xyz] fetching trace file failed: Get http://mirror.buf.us.smartcrew.xyz/xcp-ng/trace: dial tcp 107.175.70.118:80: i/o timeout
avril 15 15:51:35 xcpng-repo mirrorbits[36752]: 2020/04/15 15:51:35.204 CEST [mirror.buf.us.smartcrew.xyz] Timeout waiting for daemon connection

It's doing it only for this mirror. The rest is fine.

Any idea what's going on and how to avoid this in the future?

We might be able to contribute a fix if a fix must be done, if we get an understanding of what is going on.

elboulangero commented 2 weeks ago

I just saw the same issue. Turns out, the mirrorbits' built-in health-check stopped working. It's easy to see, since the health-check is verbose and logs a whole bunch of lines with Up! in there, every minute (by default).

So we have two mirrorbits instance running, both started on 2024-05-09, and today is 2024-06-17.

Looking at the logs for when the last health-check happened:

Won't be easy to debug, so I think I'll just restart the mirrorbits processes once a week.

@stormi If ever you see the issue again, please check if mirrorbits health-check still works!