ebmdatalab / openprescribing-data

Seed data for openprescribing
6 stars 2 forks source link

Materialized view refreshing should not involve downtime #3

Closed sebbacon closed 8 years ago

sebbacon commented 8 years ago

At the moment, updating a materialized view requires locking it for reads, resulting in some feature downtime for many hours.

We think it should be possible to implement this in two steps:

  1. Create a new matview with a temporary name
  2. Delete old matview
  3. Rename new matview
sebbacon commented 8 years ago

Have opted for the REFRESH MATERIALIZED VIEW CONCURRENTLY option in the first instance, on the grounds that even if it takes ages, it shouldn't really matter. This probably depends on the definition of "ages". The documentation does state that concurrent updates will probably be slower for the types of queries we're doing. That said, the implementation sounds pretty much like what we were going to do manually anyway.

It seems like the tradeoff is between the operation completing quickly but preventing some kinds of queries from running successfully, and having the operation completing slowly but not breaking anything.

Given "quickly" already means > 18 hours (IIRC), then as long as slowly isn't, say, > 3 days, then this solution should be Good Enough.

I still think it's suspicious that it takes quite so long for the query to execute. If this performance is a problem we could either spend a couple of days on it ourselves, or perhaps get a postgres expert to help us analyze it.

See https://github.com/ebmdatalab/openprescribing/issues/82 for preliminary discussion about performance. I will update this issue with further notes on the materialized views as I run them.

annapowellsmith commented 8 years ago

:thumbsup:

I tried this when I first built the site, but found that concurrent refreshes were very slow (>3 days).

Definitely worth a try to see if that's changed with more sensible indexes, and with other performance improvements.

We have a standing offer of Postgres help from Russ.

sebbacon commented 8 years ago

Which Russ @annapowellsmith?

sebbacon commented 8 years ago

Update: refreshing vw__presentation_summary_by_ccg took 7 hours 43 mins (27800533.211ms to be precise). So I think 3 days is probably OK.

We need to have a conversation about if it's worth improving this along the lines described at openprescribing/issues/82, viz. just calculating each month's new data (poss with BigQuery) and appending to a table, rather than recomputing the entire materialized views.

sebbacon commented 8 years ago

Closed by https://github.com/ebmdatalab/openprescribing/pull/79/commits/5dd8f89caa59a63c7b671f5145a8b262420873f8