Open avamingli opened 21 hours ago
Hi! some initial review: 1) Patch miss tab-completion for DYNAMIC feature. 2) When trying to modify dynamic table
reshke=# insert into td values(1, 1);
ERROR: cannot change materialized view "td"
Maybe some better error message here? we can check for relisdynamic and provide "cannot directly change dynamic table " with some errdetail, like "DETAIL: DYNAMIC TABLE data is automatically populated from its source query."
3) Im not sure if this a problem, but example:
reshke=# alter materialized view td rename TO tdd;
ALTER MATERIALIZED VIEW
td here is dynamic table. This all comes from fact that dynamic table relkind is 'm' (mat. view). I'm not entirely sure if this design is good. Although the implementation is undoubtedly simpler in this manner, it is at least perplexing that changing a dynamic table requires M.V. SQL syntax. Another case here:
reshke=# drop dynamic table tdd;
DROP DYNAMIC TABLE
reshke=# drop materialized view tdd ;
DROP MATERIALIZED VIEW
Both succeeds.
Maybe we should create this relation as CREATE DYNAMIC MATERIALIZED VIEW
not TABLE? This should fix a lot of SQL ambiguity problems.
In any case, we should add this ALTER pattern to regression tests and have document them in some form.
I will take another look later.
This all comes from fact that dynamic table relkind is 'm' (mat. view). I'm not entirely sure if this design is good. Although the implementation is undoubtedly simpler in this manner, it is at least perplexing that changing a dynamic table requires M.V. SQL syntax. Another case here:
reshke=# drop dynamic table tdd; DROP DYNAMIC TABLE reshke=# drop materialized view tdd ; DROP MATERIALIZED VIEW
Both succeeds.
Yes, it's a materialized view actually.
Dynamic Table is a an auto-refreshing materialized view
Dynamic Table have all operations from Materialized Views, and it must do. They are same as GPDB's external table and foreign table, if you have a try:
create external table ext1;
drop external table ext1;
drop foreign table ext1;
Both will succeed, external table is actually a foreign table.
We provide gram sugar for that CREATE/DROP, but not all command or all message infos, if users want to use Materialized Views command, that's no problem.
Maybe we should create this relation as CREATE DYNAMIC MATERIALIZED VIEW not TABLE? This should fix a lot of SQL ambiguity problems.
We follow Snowflake, customers want to have Snowflake Dynamic Tables
in CBDB, better use the same word.
Please refer to Snowflake https://docs.snowflake.com/en/user-guide/dynamic-tables-intro and our discussion #706 for more details.
In any case, we should add this ALTER pattern to regression tests and have document them in some form.
Request for discuss: https://github.com/apache/cloudberry/discussions/706#discussioncomment-11320344
- Patch miss tab-completion for DYNAMIC feature.
Oo, seems you'r big fans of tab-completion. It's ok to add them.
I'm not planing to add this right now, or at least not in this pr.
Maybe someone have interest could help.
Dynamic Table have all operations from Materialized Views, and it must do. They are same as GPDB's external table and foreign table, if you have a try:
create external table ext1; drop external table ext1; drop foreign table ext1;
Both will succeed, external table is actually a foreign table.
exmaple:
CREATE READABLE EXTERNAL TABLE ext_r(id int)
LOCATION('demoprot://dynamic_table_text_file.txt')
FORMAT 'text';
CREATE EXTERNAL TABLE
\d
List of relations
Schema | Name | Type | Owner | Storage
--------+-------+---------------+---------+---------
public | ext_r | foreign table | gpadmin |
(1 row)
drop external table ext_r;
DROP FOREIGN TABLE
CREATE READABLE EXTERNAL TABLE ext_r(id int)
LOCATION('demoprot://dynamic_table_text_file.txt')
FORMAT 'text';
CREATE EXTERNAL TABLE
\d
List of relations
Schema | Name | Type | Owner | Storage
--------+-------+---------------+---------+---------
public | ext_r | foreign table | gpadmin |
(1 row)
drop foreign table ext_r;
DROP FOREIGN TABLE
In summary, we'r doing the similar thing like external table with foreign table, for dynamic tables and materialized views. We provide some command for it, and all valid commands on matview should work on dynamic tables too. I'm not planing to copy lot's of duplicated codes from matview for just several tiny adjustment, that's pointless and the worse is: when postgres have some updated on matview command, we won't catch up and need copy the new changes too. Since they are actually the same thing, let it be.
I'm not planing to copy lot's of duplicated codes from matview
Im not suggesting this, the thing i propose is to change CREATE DYNAMIC TABLE syntax to CREATE DYNAMIC MATERIALIZED VIEW for convenience
Dynamic Table is a an auto-refreshing materialized view which could be constructed by base tables, external tables, materialized views and dynamic tables. And it could be used to answer query by AQUMV. As normal tables in CBDB, dynamic tables could also have distribution keys.
The purpose of Dynamic Tables is to solve the problem often raised by customers who are big fans of a lakehouse architecture: how can we run queries on external tables as fast as internal tables?
See details in discussion #706 and more cases in tests.
Create Dynamic Table:
CREATE DYNAMIC TABLE xxx AS
Query
TheQuery
allows any valid SELECT SQL of Materialized Views: from single or multiple relations, base tables, materialized views, and dynamic tables as well, joins, subquery, aggregation, group by and etc. However, if you want to use it to Answer Query, that is limited by AQUMV: currently we allow Select from single base table, aggregation on it or aggregation SQL replace directly #705SCHEDULE:
A string used to schedule background job which auto-refreshes the dynamic table. We follow the valid string of pg_cron extension which supports linux crontab, refer https://crontab.guru/ .
You can also use '[1-59] seconds' to schedule a job based on an interval. The example creates a cron job refreshing the dynamic table at minute 5 of each hour.
User don't need to consider the auto-refresh job, however query on pg_task catalog if we want to see the task:
As Snowflake, Dynamic Tables should always have a auto-refresh process. However, for convenience, I make SCHEDULE optional. If user didn't specific it, a default schedule is provided: maybe at every 5th minute(snowflake limit at most 5 minutes for dynamic table auto-refresh, not sure)?
WITH NO DATA:
Same as Materialized View, will create an empty Dynamic Table if specified.
DISTRIBUTED BY:
Same as normal tables in CBDB, Dynamic Tables could support distribution keys as materialized views. Use \d+ to see the distribution keys and the Query SQL of Dynamic Tables.
Refresh Dynamic Table
As seen in pg_task, we put a command to auto-refresh dynamic tables. However, if users want to do a REFRESH manually, exec command
REFRESH DYNAMIC TABLE
is also supported.REFRESH WITH NO DATA;
Same as Materialized Views, Refresh with no data will truncate the Dynamic Table and make it unpopulated status.
Drop Dynamic Table:
Drop a Dynamic Table will drop its scheduler job automatically.
Privileges
Same as Materialized Views in CBDB:
Use Dynamic Tables to answer query
Like Materialized Views, Dynamic Tables could be used to answer query too:
Authored-by: Zhang Mingli avamingli@gmail.com
Fixes #ISSUE_Number
What does this PR do?
Type of Change
Breaking Changes
Test Plan
make installcheck
make -C src/test installcheck-cbdb-parallel
Impact
Performance:
User-facing changes:
Dependencies:
Checklist
Additional Context
⚠️ To skip CI: Add
[skip ci]
to your PR title. Only use when necessary! ⚠️