badges / shields

Concise, consistent, and legible badges in SVG and raster format
https://shields.io
Creative Commons Zero v1.0 Universal
23.48k stars 5.49k forks source link

Badge Images Often Fail To Load In Github README #1568

Closed Undistraction closed 6 years ago

Undistraction commented 6 years ago

I've noticed that at least 50% of the time one or more badges on the README's from my various github project fail to display the image. I'm on a very fast connection (~ 100MBS)

In the error console:

Failed to load resource: the server responded with a status of 504 (Gateway Timeout)

The URLS are not the URLs added to the badges in the README, but point to some kind of Github cache:

URL from badge: https://img.shields.io/npm/v/blain.svg URL of error: https://camo.githubusercontent.com/fa71495d8e006d53927660ed22594c3e7097c5a6/68747470733a2f2f696d672e736869656c64732e696f2f6e706d2f762f626c61696e2e737667

Example Repos

paulmelnikow commented 6 years ago

Hi, thanks for raising this issue. I've observed this behavior too; I'm sure many people can corroborate.

If you look at https://status.shields-server.com/ and click on one server at a time, you'll see that response times sometimes spike. It's not about the speed of your connection; rather some combination of our server's capacity, and the upstream services being slow or rate limiting us. Github images are served through a proxy, and the meaning of the 504 Gateway Timeout is that the shields server has taken too long to respond to the proxy, and the proxy has given up.

I would love to put work into making Shields more reliable. I think the fix is to add server capacity, and given that we're not going to make upstream rate limiting go away, be much more aggressive with caching through several means:

Our server budget is extremely limited, and frankly we need a significantly larger budget to consider any of these these options.

We ask developers who know and love shields to please make a one-time $10 donation. If you've already given, please ask your developer friends to do the same, or solicit big donations from big projects / companies who use Shields.

https://opencollective.com/shields

Also open to promotion ideas, ideas that don't take money, and in general discussing further!

pixelass commented 6 years ago

In my case it's more like

screen shot 2018-03-22 at 15 46 17
Failed to load resource: the server responded with a status of 504 (Gateway Timeout)
Failed to load resource: the server responded with a status of 504 (Gateway Timeout)
Failed to load resource: the server responded with a status of 504 (Gateway Timeout)
Failed to load resource: the server responded with a status of 504 (Gateway Timeout)
Failed to load resource: the server responded with a status of 504 (Gateway Timeout)
g105b commented 6 years ago

Is this simply a case of not having enough server capacity? If so, would you mind letting us know the specifics of what server is being used, where it is located, and any details regarding bandwidth?

RedSparr0w commented 6 years ago

Just a quick test of the github timeouts:

1 second delay: 1 second 2 second delay: 2 seconds 3 second delay: 3 seconds 3.9 second delay: 3 seconds 3.95 second delay: 3 seconds 4 second delay: 4 seconds

Edit: Removed 5, 6 second delays as not needed, 4 seconds seems to always timeout, 3.95 seconds looks to be okay.

paulmelnikow commented 6 years ago

Is this simply a case of not having enough server capacity? If so, would you mind letting us know the specifics of what server is being used, where it is located, and any details regarding bandwidth?

@g105b Server capacity, yes, combined with more aggressive caching. See my comment above: https://github.com/badges/shields/issues/1568#issuecomment-373249602

There are three servers, single-core VPS's with 2 GB RAM: VPS SSD 1 from OVH. One is in Gravelines, France, and I believe the other two are in Quebec, Canada.

@RedSparr0w Thanks for those tests!

To everyone following this issue, if you know and love Shields, please make a one-time $10 donation if you haven't already, and ask your friends to do the same! https://opencollective.com/shields

RedSparr0w commented 6 years ago

I've noticed a trend over the past few days that server response times around 7am-10am & 1pm-3pm (UTC) are a lot higher than usual, I suspect this is the time where most of the badges are failing (due to GitHub timing out after 4 seconds). image @espadrine Is there anything in the logs that would suggest a much higher amount of traffic from any particular sources during those times?

RedSparr0w commented 6 years ago

Been tracking how often the badges have a response time over 4 seconds here, and still seems to be consistent with the above.

Between 7am-10am & 1pm-3pm response times are a lot higher than normal causing the images to timeout when loading on GitHub: chart During the weekend response times were pretty good: image On Monday and Tuesday response times were above 4 seconds almost the entire peak hours: image Note: times are UTC

shaypal5 commented 6 years ago

Anything new regarding this issue? This is now the case 99% of the time - I just don't see any pypi badges working for my repositories. This is fixed temporarily if I go and look at the badge directly (for example, at https://img.shields.io/pypi/v/pdpipe.svg ).

pixelass commented 6 years ago

Seems like moving away from badges is a good idea (or at least reduce them to a minimum e.g travis-build)

I'm hoping to get better results for the "important" badges this way.

ale5000-git commented 6 years ago

I have set maxAge=3600 for all my badges and added Shields as a GitHub application but the problem still happen.

pelson commented 6 years ago

@paulmelnikow - that is a super write-up of the problem. Thanks for doing that.

Also open to promotion ideas, ideas that don't take money, and in general discussing further!

I have a suggestion that doesn't take money, and might help the load by enabling downstream proxies (such as github's) to cache the responses.

I freely accept that I'm completely out of my comfort zone with this, but rather than adding the caching on the server side I think it would be worth considering changing the response headers to include caching:

pelson@~> curl https://img.shields.io/conda/dn/conda-forge/iris.svg
> GET /conda/dn/conda-forge/iris.svg HTTP/1.1
> Host: img.shields.io
> User-Agent: curl/7.55.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Wed, 06 Jun 2018 09:41:09 GMT
< Content-Type: image/svg+xml;charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
< Set-Cookie: __cfduid=dcaabaa<snip>8053; expires=Thu, 06-Jun-19 09:40:53 GMT; path=/; domain=.shields.io; HttpOnly
< Cache-Control: no-cache, no-store, must-revalidate
< Expires: Wed, 06 Jun 2018 09:40:59 GMT
< Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
< Server: cloudflare
< CF-RAY: 4269eb8a9f8334a6-LHR
< 
<svg xmlns="http://www.w3.org/2000/svg" ...

Specifically, the response < Cache-Control: no-cache, no-store, must-revalidate suggests to me that github's proxy doesn't even have the option of caching existing responses.

In addition, there is a stale-while-revalidate response header that appears to allow stale caches to be returned while the server is working out the new content.

I for one would be completely comfortable with a sensible cache period (and hour, 6 hours, etc.) along with a stale-while-revalidate so that users always get a response quickly, even if the response they are getting isn't the absolute latest information. I've no idea if github's proxy supports this particular header, but I can't see it being harmful.

Apologies if I've missed a conversation about caching headers - I can completely understand if there is a good reason that responses shouldn't be cached other than on the shields.io servers.

jaydenseric commented 6 years ago

@paulmelnikow have you considered using Zeit Now instead?

pelson commented 6 years ago

I guess the relevant history on cache-control is #221, and the key line: https://github.com/badges/shields/blob/bf53e612f5bbebd231f75e9899bdafe4a91aa098/lib/request-handler.js#L87

FWIW, this is quite problematic. Even the shields repo is having problems:

bad badges

eproxus commented 6 years ago

I donated $10 in the hope that this will get fixed soon! 🤞

kopax commented 6 years ago

I have the same error :

https://github.com/yeutech-lab/accept-dot-path/blob/master/README.md

image

Work fine on npm:

image

It seems that GitHub is not rendering those images correctly :

<img src="https://camo.githubusercontent.com/45aad6d50cc48a0e4ac9a1da135afdffa7795359/68747470733a2f2f696d672e736869656c64732e696f2f6e6f64652f762f40796575746563682d6c61622f6163636570742d646f742d706174682e7376673f7374796c653d666c6174" alt="npm Version" data-canonical-src="https://img.shields.io/node/v/@yeutech-lab/accept-dot-path.svg?style=flat" style="max-width:100%;">

This would be the expected value:

<img src="https://img.shields.io/node/v/@yeutech-lab/accept-dot-path.svg?style=flat" alt="npm Version" data-canonical-src="https://img.shields.io/node/v/@yeutech-lab/accept-dot-path.svg?style=flat" style="max-width:100%;">
wei commented 6 years ago

@kopax this is the intended behavior on Github. Checkout https://help.github.com/articles/about-anonymized-image-urls/

paulmelnikow commented 6 years ago

Right; and the reason you’re not seeing the badges is because github camo requests time out after ~3 seconds.

eproxus commented 6 years ago

@paulmelnikow I don't think it's camo's fault though. Requests to e.g. https://img.shields.io/hexpm/v/meck.svg?style=flat-square takes 5+ seconds (camo seems to time out after 3 seconds, thus failing to fetch the original image resulting in the missing images in READMEs)

paulmelnikow commented 6 years ago

I'm frustrated by our server capacity and that I can't act on this myself without essentially forking.

However it's not Shields or the browser that's timing out, it's camo.

No proxy = Slow badges: https://www.npmjs.com/package/react-boxplot Proxy = Flaky badges: https://github.com/paulmelnikow/react-boxplot

eproxus commented 6 years ago

@paulmelnikow I frequently see https://img.shields.io/hexpm/v/meck.svg?style=flat-square taking over 10 seconds to complete, which would point to Shields being the issue?

paulmelnikow commented 6 years ago

Shields is definitely the reason they are slow! 😛

madnight commented 6 years ago

I think shield.io needs to set more aggressive cloudflare caching options.

madnight commented 6 years ago

Hackage Hackage-Deps

I found out that you can use a "trick" to reduce the traffic directed to img.shield.io and have better caching (avoid broken images) by simply using the Google Cache. Add https://images1-focus-opensocial.googleusercontent.com/gadgets/proxy?container=focus&url= in front of your shield url, see example above.

paulmelnikow commented 6 years ago

@paulmelnikow have you considered using Zeit Now instead?

I'm a fan of that idea. I just proposed it here a couple days ago: https://github.com/badges/shields/pull/1742#issuecomment-407446963

joshenders commented 6 years ago

I work in the CDN/proxy space and can validate that @pelson's response is the correct approach. Adding server capacity for what is essentially a misconfigured HTTP response is not an efficient use of donation money.

RedSparr0w commented 6 years ago

@joshenders There is work going on with headers in #1725 which has recently been merged, with #1806 being the next step to enabling it, and hopefully getting this issue fixed 🤞

paulmelnikow commented 6 years ago

@joshenders If you have a chance to read the discussion in #1725, please do!

paulmelnikow commented 6 years ago

The recent work to set longer cache headers has just gone live. I will be curious to see how much that helps.

It is very likely we also have a capacity issue, owing to ~10% growth over the last several months. I have proposed moving to Zeit Now to fix the capacity issue and solve our sysadmin bottleneck at the same time. This proposal is blocked awaiting response from @espadrine who owns the servers and load balancer.

paulmelnikow commented 6 years ago

I’m glad to say addressing the cache headers (#1723) has had a huge effect. Today’s peak traffic is being handled like weekend traffic, with 99% of requests coming in underneath the 4 second camo timeout. The only broken badges I’m seeing today are not ours. 😁

That gives us a little time to sort out our hosting. We’re still relatively slow on a number of badges, particularly the static badges which should be instant.

RedSparr0w commented 6 years ago

Uptimes are definitely getting better: snapshot of the last 24 hours average response time (24 hours)

paulmelnikow commented 6 years ago

Another weekday over 99%. 👍😌

If this problem recurs, or there are any other follow-on proposals, let’s open a new issue.

nobody5050 commented 3 years ago

Still having issues with this on several readme’s

calebcartwright commented 3 years ago

Going to close and lock this issue as it's long been resolved but has a reasonably high potential to elicit follow-on comments.

For anyone else that stumbles upon this one...

This 3+ year old issue (as of the time of this post) was originally reflective of the fact that the Shields project experienced a lot of growth that was overwhelming the minimal runtime environment back then, and the overloaded Shields servers were often unable to serve the requested badges within the window enforced by GitHub/Camo. That in turn would result in timeouts/badges not being rendered on GitHub readme pages.

This has long since been resolved with various runtime improvements and caching mechanisms, and today Shields is serving up more than 750 million badges per month without issue. It is of course still possible that one may see a badge that failed to render in GitHub from time to time, but this isn't related to the widespread and persistent issues that originated this issue.

If anyone has questions/reports/etc. about badges not rendering, please open a new issue and/or ping us on Discord with all the relevant details, including screenshots and the badges/badge types.

Please also note that the GitHub/Camo imposed time limits for rendering images are still in place, so it's not entirely uncommon to see rendering challenges with certain badges like the Dynamic and/or Endpoint badges, particularly if those endpoints are running on a platform that periodically shuts them down (like the Heroku free tier). This can happen because there is a rather tight time window for the entire badge request/response flow to complete, and after receiving a badge request the Shields servers almost always have to first fetch data from some upstream endpoint which does not always provide the needed data quickly enough.