AmericanRedCross / osm-stats

Track and analyze contributions to Missing Maps
http://missingmaps.org
BSD 3-Clause "New" or "Revised" License
26 stars 9 forks source link

Workers: replace materialized views with incrementally-updated rollup tables #56

Open mojodna opened 5 years ago

mojodna commented 5 years ago

User and hashtag statistics are facilitated by the user_stats and hashtag_stats tables. These are updated continuously using housekeeping-loop.sh.

Under some conditions, refreshing the materialized views may take upwards of 2 hours (hence the 2h timeout), resulting in apparent lag in the API (even when the raw_changesets data is up-to-date).

Replacing the materialized views with incrementally-updated rollup tables would improve the latency tremendously while reducing the amount of I/O done by Postgres.

This post summarizes the 2 approaches well: https://www.citusdata.com/blog/2018/10/31/materialized-views-vs-rollup-tables/

I see 2 ways to approach this:

Way 1 - manually

  1. create + populate a table matching the schema of the current materialized view
  2. introduce additional UPDATE SQL to src/stats.js to accompany the existing INSERTs/UPDATEs to the raw_changesets table

This keeps the aggregation logic in one place, but may under/over-count (in the rollup table) if errors occur.

Way 2 - triggers

  1. create + populate a table matching the schema of the current materialized view
  2. create a trigger that updates relevant rollup tables when rows are added to raw_changesets