adobe / helix-dispatch

A Helix microservice that retrieves content from multiple sources and delivers the best match
Apache License 2.0
4 stars 2 forks source link

simple memory cache to dispatch for resolve-ref and the 404 statics #159

Open trieloff opened 4 years ago

trieloff commented 4 years ago

also, there are a lot of unnecessary invocations, like requesting 404.html for the gazillions of time. also note, that the actual action invocation might not be a problem, but for example executing the static concurrently, each activation still makes tcp requests (github, epsagon, coralogix) which are by default keep-alive and produce probably lingering sockets. especially, since the processes are long-lived. also, the ssh handshake is not for free, either.

Originally posted by @tripodsan in https://github.com/adobe/helix-home/issues/87#issuecomment-574444209

trieloff commented 4 years ago

For 404, definitely. For resolve-ref, questionably. We built resolve-ref because we wanted to disable the cache that raw imposes on making requests to branches.

trieloff commented 4 years ago

I wonder if switching to Helix fetch and using the cache from there would help.

tripodsan commented 4 years ago

I wonder if switching to Helix fetch and using the cache from there would help.

only if the 404s from the static contains the proper cache headers.

trieloff commented 4 years ago
GET /adobe/helix-embed/f6b6a6bb94d3cdfcfbd0458e6072c000d8b55c3b/src/embed.js HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: raw.githubusercontent.com
User-Agent: HTTPie/1.0.3

HTTP/1.1 200 OK
Accept-Ranges: bytes
Access-Control-Allow-Origin: *
Cache-Control: max-age=300
Connection: keep-alive
Content-Encoding: gzip
Content-Length: 1553
Content-Security-Policy: default-src 'none'; style-src 'unsafe-inline'; sandbox
Content-Type: text/plain; charset=utf-8
Date: Mon, 23 Mar 2020 08:08:37 GMT
ETag: W/"1512c3b550d0746e40e0fd54ae31f8504841349f63b76106d69bec2facc10b66"
Expires: Mon, 23 Mar 2020 08:13:37 GMT
Source-Age: 6
Strict-Transport-Security: max-age=31536000
Vary: Authorization,Accept-Encoding
Via: 1.1 varnish (Varnish/6.0)
Via: 1.1 varnish
X-Cache: HFM, HIT
X-Cache-Hits: 0, 1
X-Content-Type-Options: nosniff
X-Fastly-Request-ID: d6b67630646a32aaf46049f1a9c41eabc2077645
X-Frame-Options: deny
X-Geo-Block-List:
X-GitHub-Request-Id: BAB8:2941:6B31DA:7CED87:5E786E7F
X-Served-By: cache-fra19125-FRA
X-Timer: S1584950918.628197,VS0,VE0
X-XSS-Protection: 1; mode=block

They don't ☹️

kptdobe commented 4 years ago

FYi, the 2 resolve-git-ref invocations from dispatch takes a stable 750ms. The 2 calls run in parallel but they block the rest of the execution.I do not see how a cache could help (since you always want to make sure you are using the latest git version) and we introduced resolve-git-ref especially for... caching issues! But for sure, this is an area of improvement because it is has huge cost in the overall request time: if a small md takes a total of 1.7s to be "dispatched", 0.7s is resolve-git-ref (40% of the overall request).

tripodsan commented 4 years ago

we could also cache the resolve-git-ref...

The initial problem was, that we cannot influence the caching on raw.github.com, so for development (and authoring), it is annoying when changes in content in github are not reflected.

so in order to speed this up, we cache the refs for X minutes in a memory cache. similar to @davidnuescheler suggestion once, we could have some mechanism to enforce refetching the refs. eg with ?ck=... :-) (since the client request params are passed along to dispatch, this should be possible).

this way, in authoring and in development, we can request a page with ?ck=... to refresh the ref cache.

trieloff commented 4 years ago

(since the client request params are passed along to dispatch, this should be possible).

Not in a consistent way. Most client request parameters are stripped away to increase cache efficiency.

trieloff commented 4 years ago

What about this? For helix-pages, we already built a "make sure everything is uncached" mode. Why can't we just operate helix-dispatch in two different modes:

  1. mode=fast – values performance over consistency, does not use resolve-git-ref and accepts that results might be temporarily inconsistent. This should be the default mode for production.
  2. mode=consistent – values consistency over performance, always calls resolve-git-ref and accepts that mode=consistent is an alias for mode=slow. This could be the default for Helix Pages.

Putting a cache in front of the cache-buster (resolve-git-ref) doesn't seem like a move in the right direction.

tripodsan commented 4 years ago
  1. mode=fast – values performance over consistency, does not use resolve-git-ref and accepts that results might be temporarily inconsistent. This should be the default mode for production.

I don't think we should use gitraw w/o a sha. so I'd rather use a cached resolve-ref, where we are in control on when to re-resolve the ref.

  1. mode=consistent – values consistency over performance, always calls resolve-git-ref and accepts that mode=consistent is an alias for mode=slow. This could be the default for Helix Pages.

I think for authoring, a medium is better :-) or one, that can invalidate the cache explicitely.

Putting a cache in front of the cache-buster (resolve-git-ref) doesn't seem like a move in the right direction.

I don't see it as cache-buster, but rather as: we want to control the cache outselves.

trieloff commented 4 years ago

Authoring is a separate discussion. At some point I think authoring will resort to POSTing the MD body to dispatch to avoid all caching issues.

trieloff commented 4 years ago

I'd rather use a cached resolve-ref, where we are in control on when to re-resolve the ref.

Using a cache in front of resolve-git-ref saves you a fraction (cache efficiency) of the 750ms. Not using it at all saves you 100% and simplifies the implementation.

I don't have a strong opinion here, I just want to make sure we are aware of the tradeoffs.

tripodsan commented 4 years ago

Using a cache in front of resolve-git-ref saves you a fraction (cache efficiency) of the 750ms. Not using it at all saves you 100% and simplifies the implementation.

depends on how many time you call it... if you call it once and cache, and then can use the cache 1000 times, it saves you 750s :-)

trieloff commented 4 years ago

In that case the fraction is 1/1001 – still worse than 0/1001

tripodsan commented 4 years ago

caching also reduces the # of action invocations, which is good for the rate limit.