Closed cupakromer closed 7 years ago
I :heart: the caching, and think it would likely eliminate the 403's from Github's API.
Do you have any specific headers in mind that you want to be able to pass into respond_with
, I am all for adding that ability but wanted to make sure it solves a problem before adding codes.
I was focusing on cache related things with that so the normal conditional GET
headers: If-None-Match
(E-Tag) and If-Modified-Since
. Though I did state possibly changing the object structure / message passing protocol as an alternative to passing those in.
The reason being that some modern browsers, I'm looking at you Chrome, provide one of both of these headers on a page "refresh" or "restore from previous session" when the page is in local cache. So it would be good to be able to reply to them with a 304 Not Modified
- also it would potentially be faster b/c no rendering of the response.
I did a bit of testing today to try out some different redesign ideas. During this process I ended up using curl
to request an image through the current redirect url:
curl -Is "https://github.com/RadiusNetworks/proximitykit-documentation/raw/master/docs/pk-cloud.png"
HTTP/1.1 302 Found
Server: GitHub.com
Date: Thu, 22 Jun 2017 18:31:55 GMT
Content-Type: text/html; charset=utf-8
Status: 302 Found
Cache-Control: no-cache
Vary: X-PJAX, Accept-Encoding
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 100
Access-Control-Allow-Origin: https://render.githubusercontent.com
Location: https://raw.githubusercontent.com/RadiusNetworks/proximitykit-documentation/master/docs/pk-cloud.png
X-UA-Compatible: IE=Edge,chrome=1
X-Runtime: 0.017536
Content-Security-Policy: default-src 'none'; base-uri 'self'; block-all-mixed-content; child-src render.githubusercontent.com; connect-src 'self' uploads.github.com status.github.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com github-production-repository-file-5c1aeb.s3.amazonaws.com github-production-upload-manifest-file-7fdce7.s3.amazonaws.com github-production-user-asset-6210df.s3.amazonaws.com wss://live.github.com; font-src assets-cdn.github.com; form-action 'self' github.com gist.github.com; frame-ancestors 'none'; img-src 'self' data: assets-cdn.github.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; media-src 'none'; script-src assets-cdn.github.com; style-src 'unsafe-inline' assets-cdn.github.com
Strict-Transport-Security: max-age=31536000; includeSubdomains; preload
Public-Key-Pins: max-age=5184000; pin-sha256="WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18="; pin-sha256="RRM1dGqnDFsCJXBTHky16vi1obOlCgFFn/yOhI/y+ho="; pin-sha256="k2v657xBsOVe1PQRwOsHsw3bsGT2VzIqz5K+59sNQws="; pin-sha256="K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q="; pin-sha256="IQBnNBEiFuhj+8x6X8XLgh01V9Ic5/V3IRQLNFFc7v4="; pin-sha256="iie1VXtL7HzAMF+/PVPR9xzT80kQxdZeJ+zduCB3uj0="; pin-sha256="LvRiGEjRqfzurezaWuj8Wie2gyHMrW5Q06LspMnox7A="; includeSubDomains
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
X-Runtime-rack: 0.020754
X-Served-By: 7d2a2d05162492046d9960cdbc326796
Age: 0
X-GitHub-Request-Id: D5DB:4C9C:2CFB8EA:4161CAD:594C0D1B
That's when I noticed this was another redirect. Following this redirect yields:
curl -Is "https://raw.githubusercontent.com/RadiusNetworks/proximitykit-documentation/master/docs/pk-cloud.png"
HTTP/1.1 200 OK
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Frame-Options: deny
X-XSS-Protection: 1; mode=block
ETag: "fd51ed6a84d1ee36c3b0af994dea8c32c785bb29"
Content-Type: image/png
Cache-Control: max-age=300
X-Geo-Block-List:
X-GitHub-Request-Id: AB1E:1776:F1BF31:F95F97:594C0D02
Content-Length: 18411
Accept-Ranges: bytes
Date: Thu, 22 Jun 2017 18:34:09 GMT
Via: 1.1 varnish
Connection: keep-alive
X-Served-By: cache-iad2125-IAD
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1498156449.322911,VS0,VE1
Vary: Authorization,Accept-Encoding
Access-Control-Allow-Origin: *
X-Fastly-Request-ID: a440cac380cd0e67b5ebdc40c5d8a4185be58bf1
Expires: Thu, 22 Jun 2017 18:39:09 GMT
Source-Age: 159
Note that the initial URL we redirect to does include rate limits:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 100
However, the final URL we are redirected to does NOT have rate limits!
That's when I noticed that we use two different URLs: one for redirects and one for reading files. The former uses the domain github.com
while the latter uses api.github.com
.
From what I can tell the rate limits break down as follows (I retested with a markdown page as well to verify behavior was consistent across file types):
Domain | Limit | Reset |
---|---|---|
github.com |
100 | N/A |
api.github.com |
60 | 1 hour |
raw.githubusercontent.com |
None | N/A |
It should be noted that github.com
always redirects cURL
requests to the raw.githubusercontent.com
domain.
I'm not sure why we're using two different URL domains. Both were introduced in the same commit 6efa102 but no details differentiating their use. @csexton and thoughts on this?
The lack or rate limits on raw.githubusercontent.com
is very interesting, and would probably be better for this library.
I knew about that domain when I was originally working on this, however didn't use it since it seemed like an implementation detail on how github works. I opted to go with what seemed like a more public (less likely to change?) endpoint.
I suspect I intended to use api.github.com
for everything and the fact that DocRepo::GithubFile#redirect_url
uses github.com
was just an oversight. Despite this oversight I like the idea of building the urls for assets to use raw.githubusercontent.com
for now.
Less redirects, and no rate limits = win.
Background
The default Github v3 rate limit for unauthenticated APIs is about 60 requests per hour. After this all requests return
403 Forbidden
; this in turn causes aDocRepo::NotFound
(which is a misleading error) to be raised:Github gives us several solutions / workarounds for this:
GET
)Given this low default rate limit is easy for an app to hit this periodically. The current doc repo design doesn't provide a way for an app to implement any of the work arounds or configure things to use an authenticated request with a higher rate limit.
Proposal
Add the following configuration settings to allow for app level configuration of authenticated requests, caching API responses, and/or conditional
GET
support:The cache will be used to both store API responses and for internal conditional
GET
support without additional configuration.Additionally, some sort of change should be made to
DocRepo.respond_with
orRepository#respond
to support passing in the external app's conditionalGET
request headers. Or the design should be re-worked in such a way as to make that available through an alternative call system.