benvinegar / counterscale

Scalable web analytics you run yourself on Cloudflare
https://counterscale.dev
MIT License
1.52k stars 66 forks source link

Investigate using fixed interval boundaries instead of `NOW()` when making date range queries #65

Open benvinegar opened 7 months ago

benvinegar commented 7 months ago

Right now every query against Cloudflare Analytics Engine is up until the current time (NOW() in CF AE SQL). This means every minute – or even every second – the query could return a different result.

SELECT count FROM metricsDataset WHERE TIMESTAMP > NOW() - INTERVAL '1' DAY

I suspect this is bad for caching purposes. If 100 people in a row visit Counterscale, the dataset could be different each time the query is executed (because NOW() has changed), so the result can't be cached.

Ideally we should instead query on fixed bucket intervals, that way if 100 people in a row visit Counterscale within that time bucket (e.g. 5 minutes, 10 minutes, whatever), they all get the same result and it can be cached. The "downside" is that some users might expect the result to be different – especially during local development.

This could probably be done with toStartOfInterval, e.g. something like:

SELECT count from metricsDataset 
  WHERE TIMESTAMP > toStartOfInterval(NOW() - INTERVAL '1' DAY, INTERVAL '5' MINUTE) 
  AND TIMESTAMP <= toStartOfInterval(NOW(), INTERVAL '5' MINUTE)

Theoretically this should result in performance gains when viewing the dashboard (i.e. even when you just refresh the page a few times yourself). I haven't verified in practice though.