apache / cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
https://cloudberry.apache.org
Apache License 2.0
417 stars 104 forks source link

Fast path to REFRESH materialized view. #682

Closed avamingli closed 4 weeks ago

avamingli commented 1 month ago

We already have the ability to track the data status for some materialized views, aware whether its data is up to date or not. And we could avoid doing the real REFRESH if the data of view is up to date.

The no-refreshed data should be the logically same as after a real REFRESH when there is no data changed since latest REFRESH command. In that case we may save a lot (read data from view query, compute and write into view table), ex: a cron task REFRESH view takes a long time and much resource periodically or executed manually by users each time.

New GUC: gp_enable_refresh_fast_path

Set this feature default to true, but let users decide if they intend to do a real REFRESH.

Performance

If the fast path is chosen, we always return immediately and almost do nothing. And the cost we save depends on the amount of data, the resource we use to compute and the time we read and write back to view table.

insert into t1 select i from generate_series(1, 100000000) i;
create materialized view mv2 as select * from t1 where a > 1 with no data;
refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
Time: 194061.961 ms (03:14.062)

set gp_enable_refresh_fast_path = on;

refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
Time: 4.617 ms

Authored-by: Zhang Mingli avamingli@gmail.com

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

yjhjstz commented 4 weeks ago

LGTM