We already have the ability to track the data status for some materialized views, aware whether its data is up to date or not.
And we could avoid doing the real REFRESH if the data of view is up to date.
The no-refreshed data should be the logically same as after a real REFRESH when there is no data changed since latest REFRESH command.
In that case we may save a lot (read data from view query, compute and write into view table), ex: a cron task REFRESH view takes a long time and much resource periodically or executed manually by users each time.
New GUC: gp_enable_refresh_fast_path
Set this feature default to true, but let users decide if they intend to do a real REFRESH.
Performance
If the fast path is chosen, we always return immediately and almost do nothing.
And the cost we save depends on the amount of data, the resource we use to compute and the time we read and write back to view table.
insert into t1 select i from generate_series(1, 100000000) i;
create materialized view mv2 as select * from t1 where a > 1 with no data;
refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
Time: 194061.961 ms (03:14.062)
set gp_enable_refresh_fast_path = on;
refresh materialized view mv2;
REFRESH MATERIALIZED VIEW
Time: 4.617 ms
Authored-by: Zhang Mingli avamingli@gmail.com
fix #ISSUE_Number
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
[ ] Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
[ ] Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
We already have the ability to track the data status for some materialized views, aware whether its data is up to date or not. And we could avoid doing the real REFRESH if the data of view is up to date.
The no-refreshed data should be the logically same as after a real REFRESH when there is no data changed since latest REFRESH command. In that case we may save a lot (read data from view query, compute and write into view table), ex: a cron task REFRESH view takes a long time and much resource periodically or executed manually by users each time.
New GUC: gp_enable_refresh_fast_path
Set this feature default to true, but let users decide if they intend to do a real REFRESH.
Performance
If the fast path is chosen, we always return immediately and almost do nothing. And the cost we save depends on the amount of data, the resource we use to compute and the time we read and write back to view table.
Authored-by: Zhang Mingli avamingli@gmail.com
fix #ISSUE_Number
Change logs
Describe your change clearly, including what problem is being solved or what feature is being added.
If it has some breaking backward or forward compatibility, please clary.
Why are the changes needed?
Describe why the changes are necessary.
Does this PR introduce any user-facing change?
If yes, please clarify the previous behavior and the change this PR proposes.
How was this patch tested?
Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel
cloudberrydb/dev
team for review and approval when your PR is ready🥳