apache / cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
https://cloudberry.apache.org
Apache License 2.0
417 stars 104 forks source link

[AQUMV]Enable answer query using Materialized View for external table. #702

Open avamingli opened 2 weeks ago

avamingli commented 2 weeks ago

Allow answer query using materialized views which have external or foreign tables. Since we don't know if the data is up to date of externel table outside CBDB, introduce a new GUC:

aqumv_allow_foreign_table

Let user decide if they want to use matview instead of query on external tables.

create readable external table aqumv_ext_r(id int) 
location ('demoprot://aqumvtextfile.txt') format 'text';
create materialized view aqumv_ext_mv as
  select * from aqumv_ext_r;

explain (costs off, verbose)
select * from aqumv_ext_r;
               QUERY PLAN
------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: id
   ->  Seq Scan on aqumv.aqumv_ext_mv
         Output: id
 Optimizer: Postgres query optimizer

Index could also be used if there were on matviews.

create index on aqumv_ext_mv(id);
explain (costs off, verbose)
select * from aqumv_ext_r where id = 5;
                            QUERY PLAN
----------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)
   Output: id
   ->  Index Only Scan using aqumv_ext_mv_id_idx on aqumv.aqumv_ext_mv
         Output: id
         Index Cond: (aqumv_ext_mv.id = 5)
 Optimizer: Postgres query optimizer

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

avamingli commented 2 weeks ago

see more details in #693

reshke commented 2 weeks ago

aqumv_allow_foreign_table

PostgreSQL-style GUC would have name like enable_XXX, huh? So, maybe enable_aqumv_foreign_table

avamingli commented 2 weeks ago

aqumv_allow_foreign_table

PostgreSQL-style GUC would have name like enable_XXX, huh? So, maybe enable_aqumv_foreign_table

Not sure.. I follow this one: allow_system_table_mods

avamingli commented 1 day ago

@my-ship-it As we have refresh fast path at #682, but for external tables we don't know the status(always up to date in gp_maview_aux). This will make REFRESH command fail to do the real thing from external data.

So, we should skip fast path for the views have external tables. That need catalog change to record if a view has external tables outside CBDB.

my-ship-it commented 22 hours ago

@my-ship-it As we have refresh fast path at #682, but for external tables we don't know the status(always up to date in gp_maview_aux). This will make REFRESH command fail to do the real thing from external data.

So, we should skip fast path for the views have external tables. That need catalog change to record if a view has external tables outside CBDB.

Yes, I think it is a reasonable behavior. Thanks!