dwyl / learn-postgresql

🐘 Learn how to use PostgreSQL and Structured Query Language (SQL) to store and query your relational data. 🔍
212 stars 23 forks source link

Materialized View? #102

Open nelsonic opened 1 year ago

nelsonic commented 1 year ago

What is a Materialized view and how is it useful? The wikipedia page on this is a decent starting point: https://en.wikipedia.org/wiki/Materialized_view

Question

How does this work in Postgres Is the data duplicated?

Read: https://www.timescale.com/blog/how-postgresql-views-and-materialized-views-work-and-how-they-influenced-timescaledb-continuous-aggregates/

nelsonic commented 1 year ago

@LuchoTurtle my reading of the docs suggests that Materialized Views in Postgres are only updated when requested i.e. the data in the view is not real-time. Can you confirm this with a bit of Gooooogling...? 💭

LuchoTurtle commented 1 year ago

Yes, materialized views (as the name implies) are just the results of a query saved/cached somewhere. Postgres will never automatically refresh the materialized view unless explicitly told.

https://stackoverflow.com/questions/29437650/how-can-i-ensure-that-a-materialized-view-is-always-up-to-date

The whole point of materialized views is to have fast queriable data that stem from expensive operations/queries from the database. So the main thing with materialized views is:

In fact, the link you sent mentions a few strategies of data aggregation so they don't waste the previously materialized data only to be materialized again. They range from sliding window verification of new inserts/updates to simple UNION ALL where only new data is appended to what was previously materialized.

But yes, materialized views are only as accurate as the last time they ran the query they are caching.

nelsonic commented 1 year ago

Cool, that was my reading of the docs. Thanks for Summarising it succinctly, LuchoGPT. 😜

ndrean commented 1 year ago

A side note: you have "VIEW", "MATERIALIZED VIEW" and CTE (the "WITH" statement).

"VIEW" and "MATERIALIZED VIEW" are like snapshots, creating virtual tables. With "CREATE VIEW", you do not store data physically, only cached. You can "UPDATE" it. With "MATERIALIZED VIEW", the data is physically stored. To update it, you "REFRESH".

You also have "CTE", the "WITH" statement. This is used for "inline" subqueries, breaking down the main query.

LuchoTurtle commented 1 year ago

@ndrean for sure! Though VIEW does not cache any data, it just abstracts complexity from the query itself. But yeah, I understood what you meant! CTEs are really useful to organize complex views instead of having queries inside of queries, I second that 👏

ndrean commented 1 year ago

@LuchoTurtle Thanks for your precision, it was essential in fact.

Indeed (and I was wrong..), VIEW queries and CTE are evaluated each time they are referenced, whilst MATERIALIZED VIEW queries are cached (because stored).

Between CTE and VIEW, I understand the main difference is that you can restrict the users' access to only the data from the views - they may not be able to access data in the underlying tables with "GRANT/DENY"- whilst they can access the underlying table when a CTE is used. You can deny modifier access to the underlying table with the "WTH CHECK OPTION"

A (partial) toy example I played with that I hope makes sense:

CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  name VARCHAR(20),
  age INT,
  active BOOLEAN
);

INSERT INTO users (name, age, active) VALUES ('me', 2, true);
...

CTE

A complex "inlined subquery"

WITH active_users AS (
  SELECT name, age
  FROM users
  WHERE active = true
)
SELECT name FROM active_users;

VIEW

A "complex" base data:

CREATE VIEW active_users_v AS
  SELECT name, age
  FROM users
  WHERE active = true;

This first query is executed every time it is referenced, such as:

SELECT name, age FROM active_users_v;

Permissions restriction:

GRANT SELECT ON active_users_v TO lucho;
DENY SELECT ON users TO lucho;

MATERIALIZED VIEW

They are used for performance reasons when some not very frequently changing but complex base data are used frequently in other queries:

Cache "complex" :) base data:

CREATE MATERIALIZED VIEW active_users_mv AS
  SELECT age, name
  FROM users
  WHERE active = true;

If you want to use updated data, use "REFRESH":

REFRESH MATERIALIZED VIEW active_users_mv;

Otherwise, just use the MV:

SELECT name, age FROM active_users_mv;
ndrean commented 1 year ago

To avoid refreshing all these (potentially) costly tables, one may use the Postgres TRIGGERS to update some rows of the "MV" (at the expense of lengthening another query of course).

Screenshot 2023-05-08 at 15 03 30

The previous post references the following post:

Screenshot 2023-05-08 at 15 05 02