buttons / github-buttons

:octocat: Unofficial github:buttons
https://buttons.github.io
BSD 2-Clause "Simplified" License
1.08k stars 265 forks source link

Using conditional requests (If-None-Match) to prevent exceeding API rate limit #33

Closed optimalisatie closed 6 years ago

optimalisatie commented 6 years ago

Hi!

When using a Github button with a counter on a regularly visited page such as an admin panel, the user will potentially exceed it's Github API rate limit causing the button to stop working after some time. On pages with many buttons the rate limit can be used up quickly.

/**/_({
  "meta": {
    "Content-Type": "application/javascript; charset=utf-8",
    "X-RateLimit-Limit": "60",
    "X-RateLimit-Remaining": "0",
    "X-RateLimit-Reset": "1511616179",
    "X-GitHub-Media-Type": "github.v3; format=json",
    "status": 403
  },
  "data": {
    "message": "API rate limit exceeded for 123.456.789.123. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
    "documentation_url": "https://developer.github.com/v3/#rate-limiting"
  }
})

Github advises to use condition requests as a solution to prevent exceeding the API rate limit (and to save tons of useless traffic on Github's servers).

Most responses return an ETag header. Many responses also return a Last-Modified header. You can use the values of these headers to make subsequent requests to those resources using the If-None-Match and If-Modified-Since headers, respectively. If the resource has not changed, the server will return a 304 Not Modified.

https://developer.github.com/v3/#conditional-requests

It would be nice if the counter could be improved to make use of condition requests to save a lot of all API traffic from github-buttons.

ntkme commented 6 years ago

This implementation of GitHub buttons does not have backend. Without a backend, it soly relies on browsers’ caching behavior to reduce requests. As far as I know, if you don’t do hard refresh, most browsers will use the caching and conditional requests. If you are still hitting the limit, there is nothing I can do about it. However, the button should at least render without a count when the API limit runs out.

An alternative solution is to implement a server side proxy for GitHub API, but that means all the users will share API limit, which would run out even faster due to the users number. Shield.io use this method, and their workaround is to let users donate API tokens into a pool. Still, they run out of the tokens from time to time and results in broken service.

optimalisatie commented 6 years ago

Hi!

The browser doesn't add the headers automatically. I tested with Chrome 62 and Firefox 57.

Accept:*/*
Accept-Encoding:gzip, deflate, br
Accept-Language:en-US,en;q=0.8
Cache-Control:no-cache
Connection:keep-alive
Cookie:_octo=...
Host:api.github.com
Pragma:no-cache
Referer:http://domain.local/wp-admin/admin.php
User-Agent:Mozilla/5.0 ...

It means that for any button there will always be 1 request to Github API.

A backend isn't needed to make it work. It would be possible to use localStorage and Fetch API with custom conditional request headers.

https://developer.mozilla.org/en-US/docs/Web/HTTP/Conditional_requests

Using localStorage you could set a interval at which the API is queried for an update, e.g. once per 5 minutes and/or when the counter is clicked. On an high traffic admin page (e.g. WordPress) this will easily save millions of requests per day.

ntkme commented 6 years ago

I have an implementation of using CORS instead of JSON-P working, caching is yet to be implemented, because I'm a little bit hesitated on the caching behavior, through.

As you pointed out, we can use localStorage to cache data, but it does not offer expiration control. Browser tends to have 5 ~ 10MB of localStorage quota. When go over the quota, an QUOTA_EXCEEDED_ERR will be thrown. So that we not only need to cache the data, but also need to remove them. A naive solution will be clear the whole localStorage when the QUOTA_EXCEEDED_ERR is thrown, and then keep as much data as possible. A nicer solution would be implement an LRU or whatever cache on top of localStorage, but obvious it would have more complexity.

Caching + conditional requests will reduce the requests count as long as the API data does not change, but this clearly depends on the activeness of a repository.

To really reduce the requests, an extra delay that you mentioned would be required. Maybe caching + conditional requests is already good enough for a repository with average popularity. However, maybe it's not good enough for super popular ones. Then, how long should that delay be? 5 minutes sounded too arbitrary.

optimalisatie commented 6 years ago

The purpose is to reduce traffic so I believe that it would be OK to not worry about QUOTA_EXCEEDED_ERR and simply consider the localStorage cache as a feature that will provide caching for most buttons while the conditional request enables the API to optimize it's request processing. Quota management may be something to consider later to save even more traffic if it proves to be a significant benefit.

To enable automatic cleanup of browser cache it is possible to use sessionStorage. It has about the same browser support and it doesn't add extra code.

https://developer.mozilla.org/nl/docs/Web/API/Window/sessionStorage#Browser_compatibility

In regards to the delay. I am not certain what would be most favorable for Github. Do they mind buttons updating once per hour? Or would they be more happy with once per 12 hours? It may be an option to set it to 12 hours by default and enable to control the refresh interval using a data-attribute.

An other problem is that the buttons require multiple (5) individual requests per button (buttons.js 2x, buttons.css, buttons.html + Github API).

image

It may be an option to draw the buttons dynamically in a javascript controlled iframe with the CSS included in javascript. When using a dynamic iframe it is simple to write a CSS string to a <style> element. From a management perspective it is possible to use a build tool to dynamically replace a marker in the script with the latest CSS.

It would be possible to reduce the amount of requests to 1 static javascript file + uncached API requests (once per 12 hours) for unlimited buttons.

ntkme commented 6 years ago

From MDN document on sessionStorage:

Opening a page in a new tab or window will cause a new session to be initiated, which differs from how session cookies work.

So it's almost useless.

For multiple requests (html + js + css), certainly there is a little space to optimize. However, since GitHub Pages server set Cache-Control:max-age=600, which means all those file will be cached for 10 minutes. Thus those requests will only be send once to the server per page no matter how many buttons there are on the page. Also, GitHub Pages support HTTP/2. For browsers supporting it, multiple requests can be done with one connection.

GitHub API requests has Cache-Control:public, max-age=60, s-maxage=60, so that the same API requests (e.g. for a star count and a fork count) on the same page will only do one actual request.

As you can see many requests are from memory in your screenshot. So, personally I'm worried about repeated requests on the same page, since browsers handle this very well.

optimalisatie commented 6 years ago

sessionStorage would enable to cache buttons for a specific website/admin panel. Only when navigating the web / starting new windows will cause a new API request for the same button.

The advantage of sessionStorage is that it takes less code and that there is less requirement for cache management. It could be added relatively easy and it would provide in a sufficient solution for buttons on actively browsed pages (e.g. admin panels).

In regards to the requests being cached by the browser, that is correct, however the individual requests cause a significant overhead.

image

A dynamic iframe could be much faster but I agree that it does not help much in reducing the amount of API requests. If you would want to optimize it, it would be an option to look at Shadow DOM as alternative for an iframe.

https://code.tutsplus.com/tutorials/intro-to-shadow-dom--net-34966 http://blog.catchpoint.com/2017/11/03/shadow-doms-encapsulation-progressive-web-applications/ https://www.polymer-project.org/2.0/start/quick-tour (cross browser solution from Google)

ntkme commented 6 years ago

I won't use shadow DOM because it is not a stable standard and only Chrome based browsers currently support it by default. It still has a long way to go.

And it's entirely possible that a user open links in your admin panel in new tabs, and that would create new sessions for sessionStorage.

Speaking of overhead caused by individual requests - those happen mostly in parallel if you look at timeline, so not really a big deal.

optimalisatie commented 6 years ago

A dynamic iframe may be the best solution for performance and compatibility. It would enable to include the buttons.js code inline to potentially draw the buttons with zero requests. It would enable to use the buttons more efficiently in a offline app or progressive web app. It would also reduce the amount of code so it would lead to the best performance possible.

In regards to the timeline. I can imagine Github buttons being used on more advanced apps. Every CPU cycle may count in those areas so for developers the less CPU is used, the better. Requests are very heavy to process, especially on mobile devices. It causes a great overhead. As an example, Google uses localStorage to cache assets in favor of simply using browser cache.

Tests by Google and Bing have shown that there are performance benefits to caching assets in localStorage (especially on mobile) when compared to simply reading and writing from the standard browser cache. http://www.stevesouders.com/blog/2011/03/28/storager-case-study-bing-google/ https://addyosmani.com/basket.js/

If you want to optimize for CPU availability in heavy / critical apps where the buttons may be used, you could take a look at requestIdleCallback. It will ensure that the buttons will never get in the way of the app / website user. setTimeout(..,0) could make it work cross browser. It could be combined with requestanimationframe for optimal render performance.

https://developers.google.com/web/updates/2015/08/using-requestidlecallback

In regards to sessionStorage. I suggested it from your perspective as it would cost less time and it may be a sufficient solution. If just 50% of traffic could be reduced it would make a big difference. Together with the conditional requests it may be sufficient to solve the main issue (exceeding API rate limit, users being blocked from Github API).

localStorage + management would be a much better solution but it will cost more time to create a solution with the best efficiency and least amount of code.

In regards to a potential implementation. It may be an option to make it a non critical feature that simply provides caching 'if it works' and otherwise uses Github API requests. This could make management simple. If there is a quota issue, it would be possible to simply ignore it and load from a regular request. It would be up to the website owner / developer to manage the localStorage cache. Button counts do not take significant space so it may not require cleanup.

ntkme commented 6 years ago

A dynamic iframe

If I understand you correctly, you mean an implementation like mdo/github-buttons, which uses a single HTML containing embedded <script> and <style> tags.

There is a reason to NOT do so. First, such <iframe> cannot be automatically sized, with the content being dynamic, you have to give a size that’s slightly larger than the actual content to give it some space to grow. For example, a different may render a wider content, going from 99 stars to 100 stars will results in a wider content, etc. But your <iframe> width has to be hard coded, which is annoying.

To automatically size an <iframe>, it has to be a same-origin <iframe>. But same-origin <iframe> could be accidentally altered by parent page JavaScript. That’s why we have the current solution:

  1. Load buttons.js on parent page.
  2. buttons.js create a same-origin <iframe>, and render it with an embedded HTML string.
  3. The embedded HTML string will load the same buttons.js from cache, and render the button.
  4. Once button is render, get and save its dimensions.
  5. Reload the <iframe> with buttons.html instead of HTML string. This will be cross-origin.
  6. Set the <iframe> size to be exact dimensions saved in step 4.

Given these steps, if you try to bundle JavaScript / CSS into HTML, it actually increases traffic. Bundling CSS into JavaScript is the only place that could give a little performance benefit.

If you really care that tiny bit of loading speed, make a fork and skip step 5, so that you can bundle HTML & CSS into JavaScript.

more advanced apps

Most of advanced web apps today are single page apps. They don’t reload the page thus buttons on the page is just a one time thing. Not so much meanings in optimizing it.

requestIdleCallback

Experimental API, so NO. Using setTimeout to make code execute asynchronously? I already use async as much as possible, expect for parts that are not necessary.

I won’t use experimental technology in production unless it’s widely supported and stable enough. Even if it can be fallback with code that’s more compatible, unless the benefit is significant, I’d rather wait for the API to be widely supported.

Conditional requests with XHR/CORS

First, unlike JSON-P which is essentially a <script>, XHR is isolated from page load event. When XHR load event is triggered, the callback has not been triggered, but in JSON-P, the load event means callback is already finished. And same for the error event.

Those are all extra layers of complexity, but I at least I can live with those, and I already have an experimental implementation of this.

localStorage / sessionStorage

This is where I get uncomfortable. We make compromises all the time in web dev, but I don’t like making compromises saying maybe that’s good enough. I respect users' privacy (and care about the overhead) so I don't collect usage data for analysis, how could I know it's good enough even with experiments.


Going back to your admin panel example, as a compromise, the easiest solution would be disable the dynamic count. And if you really care about performance, put a static button directly on the page.


I’m open to ideas. As always, Pull Requests are very welcome.

optimalisatie commented 6 years ago

I just wanted to provide some ideas to consider. I understand that it may not be applicable for github-buttons. Besides the API limit / caching issue the buttons load fast and look professional/authentic.

We will be using the buttons on WordPress admin pages that are actively browsed by website creators during performance optimization.

image

requestIdleCallback has proven to be exceptionally effective (especially for widgets such as social buttons). It can enable loading a page with 1000+ buttons in 200-300ms while the buttons load smooth based on available CPU capacity. It may be an option to test it with a HTML page with 100+ buttons to see the difference. setTimeout could be used for any browser that does not support it (yet) so that the complexity is low and reliability is high.

In regards to the dynamic iframe, I understand your choice.

In regards to modern apps, a progressive web app (Google PWA) does load each page individually and has no ability to persist a button on the page.

https://developers.google.com/web/progressive-web-apps/

In regards to sessionStorage. If you're interested to invest the time, localStorage may be a better option. However, to quickly solve the main issue sessionStorage may be a safe and easy option.

I am not familiar with coffeescript and didn't know your exact wishes for the project.

I am interested to create a concept / pull request. What options would you be interested in?

1) requestIdleCallback and requestAnimationFrame 2) localStorage cache 3) conditional request based on XHR and/or Fetch API

ntkme commented 6 years ago

PWA does not conflict with the idea of single page app (SPA). A web app can be both PWA and SPA at same time. A PWA won't necessarily be a SPA, and vice versa.


For browsers do not support requestIdleCallback, setTimeout changes nothing comparing to the current implementation. On the parent page, the buttons.js has attribute async and defer, and for code inside <iframe>, they are under a different context so the synchronous code does not block parent page at all. So requestIdleCallback may only help browsers supporting it.

Also, there is no magic about requestIdleCallback, parent page code and <iframe> code has to call it separately. Event callbacks have to call it individually again, in order to lower their priority.

My real concern is that the current implementation heavily relies on load, error, and readystatechange event on <iframe> and JSON-P <script>. I detect (<iframe>'s load event (and JSON-P <script>'s (load or error) event if exists)) from the parent page to determine the first time a button get rendered. Adding requestIdleCallback means that the current detection method is no longer working.

Instead, in the step of The embedded HTML string will load the same buttons.js from cache, and render the button.:

On the parent page:

  1. Wait for <iframe>'s load event so that contentWindow will be accessible.
  2. See if loaded is true in <iframe>. If true, we're done (unlikely to happen), otherwise, set up the hook function onload in <iframe>, and wait for the hook to be called.
  3. Go to the next step to get the button dimensions.

In the <iframe>:

  1. The code shall run with requestIdleCallback.
  2. It should setup a flag loaded when the button is fully rendered.
  3. It should call a callback hook function onload if it is set up when the button is fully rendered.

To JSON-P, the current implementation, "fully rendered" means <script>'s native load or error event, but if we put requestIdleCallback in the callback the current method no longer holds.

To CORS, "fully rendered" means it has ((a 200 response and callback is completed), or it has a non 200 response, or CORS' error event). That's already a lot of code to write and test.

What I want to say is: it's really complicated.


For your worst 1000 buttons example, a workaround would be load buttons.js as a CommonJS module, and manually call render() after all important stuff has being loaded.


I would consider CORS XHR + Conditional requests + LRU localStroage cache with JSON-P fallback.

I'm open to requestIdleCallback, but it will be a super low priority, because of the complexity.

optimalisatie commented 6 years ago

requestIdleCallback will benefit Firefox 55+, Chrome 47+ and Opera 34+. Chrome has a 60% market share so it will benefit about 70-80% of all browsers.

The benefit of requestIdleCallback is significant and in essence the functionality is very simple. It simply enables to prioritize javascript execution. It makes it possible to start loading the buttons when CPU is available. On a fast desktop PC there will be no delay while on a overloaded mobile phone the buttons may be delayed for 10+ seconds allowing the main page to be loaded much faster. It will make the buttons more friendly to a page. It sets the priority of the buttons to a lower level so that it won't compete for resources with the main javascript of a page.

parent page code and