cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.98k stars 1.78k forks source link

Option to add cache control headers based on `refreshKey` values #2368

Open schneefux opened 3 years ago

schneefux commented 3 years ago

I want to replace a custom-build, public query API by Cube.js. The current API server makes heavy use of HTTP caching. Because it is behind 2 proxies (Cloudflare + nginx) and the requests have a very high cache hit ratio, I am able to serve a high volume of requests per second with a small server.

Using the vanilla Cube.js client, I can see that a response from the Cube.js server has an ETag header and the client sends If-None-Match accordingly on the next request - which would be great for caching. Unfortunately though, the response does not set Cache-Control, causing it to bypass/miss both caches, returning the full JSON. To reproduce, open http://cubejs-stripe-dashboard-example.s3-website-us-west-2.amazonaws.com/ in Chrome and check the network tab. All requests have status 200 (instead of 304) and the header x-cache: Miss from cloudfront.

For my app I'd like to set Cache-Control: public, max-age=60 because my data is not realtime and a cache duration of 60s reduces the maximum amount of cache misses per unique query to 1 per minute. stale-while-revalidate=…, stale-if-error=… would be nice to have too.

I checked the server's source code and it does not seem to be possible to set custom server headers with the new Docker-based architecture.

paveltiunov commented 3 years ago

Hey @schneefux ! Thanks for posting this! It's definitely would be helpful to have this option in place. I'm curious what's the volume of requests you're dealing with?

fradot commented 3 years ago

Hi @paveltiunov, @schneefux, I'm facing the same issue using CubeJs on a lambda behind an AWSGateway. It would be great to be able to set the response headers directly from CubeJs, did you find a workaround already? Thanks

schneefux commented 3 years ago

I have not had time to look into this further yet.

The legacy API responds to 40 cache misses per minute on average, the first layer (nginx) having a 85% cache hit ratio for API queries and the second (Cloudflare) 70% cache hit ratio globally. The highest peak traffic so far was x5.

So uncached I'd probably handle 40/(1-85%) = ~250 queries/minute (with peaks up to ~1250 q/m).

paveltiunov commented 3 years ago

@schneefux Gotcha. I believe we should support an option to provide these headers based on refresh key values. Do you think Expires header would work in case refreshKey is defined as every interval?

github-actions[bot] commented 3 years ago

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.