Open visill opened 1 month ago
Should add test such as explain and execution for join with other distribution table.
Should add test for join with other distribution table.
Maybe, but main reason for having LOCAL relation is that they act exactly like pg_catalog relations (so, used only on coordinator node).
Should add test for join with other distribution table.
Maybe, but main reason for having LOCAL relation is that they act exactly like pg_catalog relations (so, used only on coordinator node).
for example, save-restore plan (sr_plan) extension uses relation to store query plans. Designed originally for postgres, this extension does not work well in Greenplum (or its forks), because it tries to perform distributed query each time we access sr_plans
table.
https://github.com/yezzey-gp/ygp/blob/YGP_6.27_STABLE/gpcontrib/sr_plan/sr_plan--1.1--1.2.sql#L5-L15
Squash commit message?
Hi, @visill , thank you for your interesting about cloudberry. Could you give more details about why we should change the current data model? Coordinator in cloudberry inherits from greenplum. It doesn't process data scan, and tries to move the workload to the segments, so the database has the ability to scale to a large data processing system. Local table seems violates our design, we need more discussion about why it's necessary.
Cloudberry has many extensions that were backported from PostgreSQL. Generally, our principle is to adapt the extensions to our distributed database, not the other way around.
Cloudberry has many extensions that were backported from PostgreSQL. Generally, our principle is to adapt the extensions to our distributed database, not the other way around.
Absolutely right.
This is wrong way, you need to make your extensions compatible with CBDB kernel codes. Greenplum/Cloudberry store catalog on master, other tables should be on segments in MPP mode. Could you explain more why need create table on master?
Why your extensions need to store data on master, if it must be , it's a catalog. If it needs to work only on master, you need to change your extensions with Gp_role to do that. A lot of extensions are modified to be compatible in MPP like that.
Could you explain more why need create table on master?
Because we need to store and access data on master, during planning phase. At the first glance it is about sr_plan
extension. But turns out, it is not only about that.
Why your extensions need to store data on master, if it must be , it's a catalog. If it needs to work only on master, you need to change your extensions with Gp_role to do that. A lot of extensions are modified to be compatible in MPP like that.
While this is generally true, it is very unhandy to do that. Today I bumped into problems with another extension - pg_hint_plan
. This extension also uses table to store its planner 'hints'. This table should persist on master exclusively.
So, the ability to force Greenplum to create relation without any distribution policy would help with at least two extensions now.
And yes, we can play with Gp_role
inside extension installation script and all functions/sql/other that access this relation. Whilst it is achievable with sr_plan
, it is too much work in other cases, especially for https://github.com/ossc-db/pg_hint_plan
We also can execute heap_create_with_catalog
from extension install script, to create all needed relation as we want (with NULL gp_policy).
But specifying one proper option in CREATE TABLE
sql is just much more simple.
If one dislikes SQL approach, we can do it another way: using GUC to control transformDistributedBy behaviour when no distrib clause specified
We also can execute heap_create_with_catalog from extension install script, to create all needed relation as we want (with NULL gp_policy
That is forbidden in GPDB, all tables should have a distribution policy , you are passing an invalid param with that function.
We also can execute heap_create_with_catalog from extension install script, to create all needed relation as we want (with NULL gp_policy
That is forbidden in GPDB, all tables should have a distribution policy , you are passing an invalid param with that function.
It is impossible to forbid something for extension. extension install script is executed under superuser
We also can execute heap_create_with_catalog from extension install script, to create all needed relation as we want (with NULL gp_policy
That is forbidden in GPDB, all tables should have a distribution policy , you are passing an invalid param with that function.
It is impossible to forbid something for extension. extension install script is executed under superuser
I'm not talking about the permissions. Superuser doesn't mean volatile our MPP design is right. It's terrible that tables are created without distribution policy entry in MPP, and you intend to use that bug.
Change logs
Add: create table table_name DISTRIBUTED BY LOCAL Creates a table that is queried only on the coordinator
Why are the changes needed?
Some postgres extensions (sr_plan or yezzey) needs to create local table. Before we do this by C-functions. Now we can do using SQL syntax
Does this PR introduce any user-facing change?
No
How was this patch tested?
It has regress test
Contributor's Checklist
Here are some reminders and checklists before/when submitting your pull request, please check them:
make installcheck
make -C src/test installcheck-cbdb-parallel
cloudberrydb/dev
team for review and approval when your PR is ready🥳