AmericanRedCross / osm-stats

Track and analyze contributions to Missing Maps
http://missingmaps.org
BSD 3-Clause "New" or "Revised" License
26 stars 9 forks source link

API: Improve /countries/<country> performance #55

Open mojodna opened 5 years ago

mojodna commented 5 years ago

/countries/<country> counts are implemented in

https://github.com/AmericanRedCross/osm-stats-api/blob/52e3e5a8ef0a582488f252473e56eaca0b0dffb0/routes/country-page.js#L162-L188

Querying the changesets table (which is actually a view over raw_changesets) is untenable, as it currently contains 61107400 rows (simply selecting the number of rows took several minutes).

An approach similar to how user + hashtag statistics are periodically aggregated using a materialized view (e.g. https://github.com/AmericanRedCross/osm-stats-workers/blob/232b75424835f5b0693398af8198eb97ed49d2a0/sql/hashtag_stats.sql) would improve performance dramatically (especially with the addition of an index on <country>).

Secondarily, wrapping queries in a caching function similar to

https://github.com/AmericanRedCross/osm-stats-api/blob/52e3e5a8ef0a582488f252473e56eaca0b0dffb0/routes/hashtag.js#L62-L72

would further help things. The cache library in use (locking-cache) limits the number of concurrent matching requests to 1, so multiple requests to the same endpoint will wait on the execution of a single SQL query. The use of stale: true (https://github.com/AmericanRedCross/osm-stats-api/blob/52e3e5a8ef0a582488f252473e56eaca0b0dffb0/routes/hashtag.js#L21) when initializing the cache allows it to return stale data immediately while still executing the query for subsequent requests (helpful in the case of hashtag endpoints, as there will be many repeated requests for the same data).

Unfortunately, the underlying data (via `raw_changesets_countries.sql does only supports "N changesets in this country" (not "N edits...") due to the lack of a count (or similarly-named) column.