badges / shields

Concise, consistent, and legible badges in SVG and raster format
https://shields.io
Creative Commons Zero v1.0 Universal
23.38k stars 5.49k forks source link

504 Gateway Timeout on badges inside GitHub README #4878

Closed akosthekiss closed 4 years ago

akosthekiss commented 4 years ago

Are you experiencing an issue with...

:beetle: Description

We have some projects where we have started using shields.io badges only recently. However, it seems that when the READMEs of the projects are rendered on GitHub, the badges don't show up. (Yesterday, it was roughly 50% of the badges not showing, today they don't show, like, at all.)

Network monitor shows GET requests to camo.githubusercontent.com lasting 4100+ ms, ending in 504 Gateway Timeout (Error Fetching Resource).

A private GitLab instance is also hosting the repos, there the README is rendered with all badges visible. In that case, however, network monitor shows requests going directly to img.shields.io. But the requests take 17 secs each (or more).

The projects are:

paulmelnikow commented 4 years ago

Hi, thanks for posting. We deployed an experimental branch (#4756) last night to a canary server and it seems like that may be the cause of this issue. Rolling it back now and hoping it stabilizes things.

We are actively working to upgrade our hosting environment to replace it with something more modern, scalable, and reliable, and which removes our administration and deployment bottleneck. One of the remaining big pieces is being completed now #4874. I don't have a timeline to report on this, but am hoping it can happen soon.

jrwrigh commented 4 years ago

Note that this has been going on longer than just last night. See https://github.com/latex-lsp/texlab/issues/195# and ~https://github.com/vmg/redcarpet/issues/688~

Edit: Just realized the redcarpet README doesn't use this for it's (failing) badge.

calebcartwright commented 4 years ago

One thing I've noticed is a strong correlation between OVH (our hosting provider) maintenance/incidents and the recent windows of delayed badge responsiveness.

It could certainly be coincidental, but we've had these responsiveness issues intermittently lately and there always seems to be corresponding OVH issues during/preceeding those same intervals.

https://twitter.com/ovh_status? https://github.com/badges/shields/blob/master/doc/production-hosting.md#badge-servers

akosthekiss commented 4 years ago

I'm not sure whether it's the rollback or the hosting provider, but now the badges seem to show up "instantly" (40-600 ms). Thanks for the service!

paulmelnikow commented 4 years ago

I highly suspect @calebcartwright is right that this one was the hosting provider.

VladimirMikulic commented 4 years ago

This happens on my project as well every single time.

paulmelnikow commented 4 years ago

Really sorry about the downtime today. OVH is having some issues which I think are affecting us. I've got a draft of a new hosting proposal which should fix this and also eliminate the bottlenecks (on Thaddée for handling the servers, and on me for deploying). Please bear with us!

VladimirMikulic commented 4 years ago

@paulmelnikow I appreciate that very much. Thank you :+1:

sblantipodi commented 4 years ago

same problem here, how to see a solution soon. thanks for your work guys

paulmelnikow commented 4 years ago

I'll have this proposal posted in the next day or two.

julienw commented 4 years ago

I'm glad I'm not the only one :-) But I'm a bit puzzled: requesting the badge directly works perfectly well, while requesting the same badge through github's camo URL times out every time. Could it be a problem with the hosting provider OVH actively filtering out requests coming from github, possibly because a lot of requests are coming from the same IP (or set of IPs)? Like an automatic DoS protection?

here are my direct links: direct link: https://img.shields.io/matrix/profiler:mozilla.org?server_fqdn=matrix.org&label=matrix github link: https://camo.githubusercontent.com/9082daa9d13854fed209b43073465347bdc25933/68747470733a2f2f696d672e736869656c64732e696f2f6d61747269782f70726f66696c65723a6d6f7a696c6c612e6f72673f7365727665725f6671646e3d6d61747269782e6f7267266c6162656c3d6d6174726978

julienw commented 4 years ago

I see problems for badges coming from other sources too, so this might be a transient problem with github.

paulmelnikow commented 4 years ago

But I'm a bit puzzled: requesting the badge directly works perfectly well, while requesting the same badge through github's camo URL times out every time. Could it be a problem with the hosting provider OVH actively filtering out requests coming from github, possibly because a lot of requests are coming from the same IP (or set of IPs)? Like an automatic DoS protection?

It's more likely that GitHub Camo's four-second request timeout is what's causing the problem.

The hosting proposal is up at #4929 and hopefully we'll be able to move forward with this soon!

regevbr commented 4 years ago

I experience the same issue at https://github.com/PruvoNet/node-upgrade-checker/blob/master/README.md The failed badges and the number of them are random

fishcharlie commented 4 years ago

I know this is specific to GitHub, but it looks like there are problems on https://shields.io as well. I'm assuming it's the same underlying problem, but just wanted to make sure it was brought up as well.

image
paulmelnikow commented 4 years ago

Yikes! We've got an active proposal to address this. I wonder if we can wait for that discussion to resolve or if we need some interim action.

ghost commented 4 years ago

Can repro on GitLab too...

image

All of the ones lacking a preview are ones provided by shields.io

LGouellec commented 4 years ago

Same issue with my repo https://github.com/LGouellec/kafka-streams-dotnet.

fishcharlie commented 4 years ago

Same issue with my repo https://github.com/LGouellec/kafka-streams-dotnet.

@LGouellec Please consider using the 👍 reaction instead of commenting information that has already been posted. It effects every usage of Shields, every repo. If you have new relevant information, commenting is the best method. Otherwise please be respectful to subscribers of this thread to the thread and use the reaction feature.

paulmelnikow commented 4 years ago

I've set up a test server on Heroku at https://img-test.shields.io/badge/build-passing-brightgreen and sent a note to @espadrine asking his consent to move all the traffic over there, as an experiment.

ghost commented 4 years ago

For what it is worth, we have observed intermittent 504s for around a month and a half now. They seem to have gone away at some point and come back again more recently, but it doesn't appear to be a new issue that we are observing.

This is from GitLab specifically though; I'm not sure what else could interfere with that. Not sure if that is useful context at all?

paulmelnikow commented 4 years ago

The ops team has moved forward with an experiment to test hosting Shields on Heroku, which has been live on img.shields.io for about 14 hours. Overall it's working pretty well. We had a minor setback on the Discord badges (#4957) but have shimmed it for now. Please open an issue if you're seeing any new issues with badges, and follow https://github.com/badges/shields/issues/4929#issuecomment-620140625 if you'd like more information about the ops experiment. Let's keep this open for the duration.

calebcartwright commented 4 years ago

I'm going to close this out given the success of the Heroku experiment and decision to go forward as the new runtime environment has resolved the timeout/latency issues.