Open trieloff opened 4 years ago
For 404, definitely. For resolve-ref, questionably. We built resolve-ref because we wanted to disable the cache that raw imposes on making requests to branches.
I wonder if switching to Helix fetch and using the cache from there would help.
I wonder if switching to Helix fetch and using the cache from there would help.
only if the 404s from the static contains the proper cache headers.
GET /adobe/helix-embed/f6b6a6bb94d3cdfcfbd0458e6072c000d8b55c3b/src/embed.js HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: raw.githubusercontent.com
User-Agent: HTTPie/1.0.3
HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Cache-Control: max-age=300
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 1553
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Content-Type: text/plain; charset=utf-8
Date: Mon, 23 Mar 2020 08:08:37 GMT
ETag: W/"1512c3b550d0746e40e0fd54ae31f8504841349f63b76106d69bec2facc10b66"
Expires: Mon, 23 Mar 2020 08:13:37 GMT
Source-Age: 6
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
Via: 1.1 varnish (Varnish/6.0)
Via: 1.1 varnish
X-Cache: HFM, HIT
X-Cache-Hits: 0, 1
X-Content-Type-Options: nosniff
X-Fastly-Request-ID: d6b67630646a32aaf46049f1a9c41eabc2077645
X-Frame-Options: deny
X-Geo-Block-List:
X-GitHub-Request-Id: BAB8:2941:6B31DA:7CED87:5E786E7F
X-Served-By: cache-fra19125-FRA
X-Timer: S1584950918.628197,VS0,VE0
X-XSS-Protection: 1; mode=block
They don't ☹️
FYi, the 2 resolve-git-ref
invocations from dispatch
takes a stable 750ms. The 2 calls run in parallel but they block the rest of the execution.I do not see how a cache could help (since you always want to make sure you are using the latest git version) and we introduced resolve-git-ref
especially for... caching issues!
But for sure, this is an area of improvement because it is has huge cost in the overall request time: if a small md takes a total of 1.7s to be "dispatched", 0.7s is resolve-git-ref
(40% of the overall request).
we could also cache the resolve-git-ref...
The initial problem was, that we cannot influence the caching on raw.github.com
, so for development (and authoring), it is annoying when changes in content in github are not reflected.
so in order to speed this up, we cache the refs
for X minutes in a memory cache.
similar to @davidnuescheler suggestion once, we could have some mechanism to enforce refetching the refs. eg with ?ck=...
:-) (since the client request params are passed along to dispatch, this should be possible).
this way, in authoring and in development, we can request a page with ?ck=...
to refresh the ref cache.
(since the client request params are passed along to dispatch, this should be possible).
Not in a consistent way. Most client request parameters are stripped away to increase cache efficiency.
What about this? For helix-pages
, we already built a "make sure everything is uncached" mode. Why can't we just operate helix-dispatch
in two different modes:
mode=fast
– values performance over consistency, does not use resolve-git-ref
and accepts that results might be temporarily inconsistent. This should be the default mode for production.mode=consistent
– values consistency over performance, always calls resolve-git-ref
and accepts that mode=consistent
is an alias for mode=slow
. This could be the default for Helix Pages.Putting a cache in front of the cache-buster (resolve-git-ref
) doesn't seem like a move in the right direction.
mode=fast
– values performance over consistency, does not useresolve-git-ref
and accepts that results might be temporarily inconsistent. This should be the default mode for production.
I don't think we should use gitraw w/o a sha. so I'd rather use a cached resolve-ref, where we are in control on when to re-resolve the ref.
mode=consistent
– values consistency over performance, always callsresolve-git-ref
and accepts thatmode=consistent
is an alias formode=slow
. This could be the default for Helix Pages.
I think for authoring, a medium
is better :-) or one, that can invalidate the cache explicitely.
Putting a cache in front of the cache-buster (
resolve-git-ref
) doesn't seem like a move in the right direction.
I don't see it as cache-buster, but rather as: we want to control the cache outselves.
Authoring is a separate discussion. At some point I think authoring will resort to POSTing the MD body to dispatch to avoid all caching issues.
I'd rather use a cached resolve-ref, where we are in control on when to re-resolve the ref.
Using a cache in front of resolve-git-ref
saves you a fraction (cache efficiency) of the 750ms. Not using it at all saves you 100% and simplifies the implementation.
I don't have a strong opinion here, I just want to make sure we are aware of the tradeoffs.
Using a cache in front of
resolve-git-ref
saves you a fraction (cache efficiency) of the 750ms. Not using it at all saves you 100% and simplifies the implementation.
depends on how many time you call it... if you call it once and cache, and then can use the cache 1000 times, it saves you 750s :-)
In that case the fraction is 1/1001 – still worse than 0/1001
caching also reduces the # of action invocations, which is good for the rate limit.
also, there are a lot of unnecessary invocations, like requesting 404.html for the gazillions of time. also note, that the actual action invocation might not be a problem, but for example executing the static concurrently, each activation still makes tcp requests (github, epsagon, coralogix) which are by default keep-alive and produce probably lingering sockets. especially, since the processes are long-lived. also, the ssh handshake is not for free, either.
Originally posted by @tripodsan in https://github.com/adobe/helix-home/issues/87#issuecomment-574444209