Closed sebbacon closed 8 years ago
Have opted for the REFRESH MATERIALIZED VIEW CONCURRENTLY
option in the first instance, on the grounds that even if it takes ages, it shouldn't really matter. This probably depends on the definition of "ages". The documentation does state that concurrent updates will probably be slower for the types of queries we're doing. That said, the implementation sounds pretty much like what we were going to do manually anyway.
It seems like the tradeoff is between the operation completing quickly but preventing some kinds of queries from running successfully, and having the operation completing slowly but not breaking anything.
Given "quickly" already means > 18 hours (IIRC), then as long as slowly isn't, say, > 3 days, then this solution should be Good Enough.
I still think it's suspicious that it takes quite so long for the query to execute. If this performance is a problem we could either spend a couple of days on it ourselves, or perhaps get a postgres expert to help us analyze it.
See https://github.com/ebmdatalab/openprescribing/issues/82 for preliminary discussion about performance. I will update this issue with further notes on the materialized views as I run them.
:thumbsup:
I tried this when I first built the site, but found that concurrent refreshes were very slow (>3 days).
Definitely worth a try to see if that's changed with more sensible indexes, and with other performance improvements.
We have a standing offer of Postgres help from Russ.
Which Russ @annapowellsmith?
Update: refreshing vw__presentation_summary_by_ccg
took 7 hours 43 mins (27800533.211ms to be precise). So I think 3 days is probably OK.
We need to have a conversation about if it's worth improving this along the lines described at openprescribing/issues/82, viz. just calculating each month's new data (poss with BigQuery) and appending to a table, rather than recomputing the entire materialized views.
At the moment, updating a materialized view requires locking it for reads, resulting in some feature downtime for many hours.
We think it should be possible to implement this in two steps: