citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.57k stars 670 forks source link

Add support for Postgres 14 #4991

Closed onderkalaci closed 2 years ago

onderkalaci commented 3 years ago

As PG 14 is on the horizon, we should make Citus work with PG 14.

As a development strategy, I think we should follow a similar approach to PG 13 integration, where any change is documented with a separate commit and the commit has a reference to the related change in PG: https://github.com/citusdata/citus/pull/3900/commits

So that the review becomes much easier

devrimgunduz-edb commented 3 years ago

Thanks, I did not have to create a ticket :)

SaitTalhaNisanci commented 3 years ago

@onderkalaci I wanted to write the high level steps for how pg13 support was added: 1-> Enable postgres 14 in configure 2-> Copy ruleutils13 as ruleutils14 and apply the relevant changes from postgres14 on top of ruleutils14, (note that this diverges from the postgres ruleutils, see https://github.com/citusdata/citus/issues/3855 3-> Compile citus with postgres 14, to solve the compilation issues:

4-> After Citus compiles with postgres 14, run the tests and for each failing test:

5-> After all tests pass, for each postgres 14 changelog:

6-> Add postgres 14 test images to our CI

onderkalaci commented 3 years ago

Some issues I hit while testing #5209

-- connect to worker -- check shard \d+ compress1_102113 Table "public.compress1_102113" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+------+-----------+----------+---------+----------+-------------+--------------+------------- c1 | text | | | | extended | | | c2 | text | | | | extended | | | Access method: heap


- [x] We do not propagate ALTER COLUMN .. SET COMPRESSION #5227 
```SQL
ALTER TABLE compress1 ALTER COLUMN c2 SET COMPRESSION pglz ;
ERROR:  alter table command is currently unsupported
CREATE TABLE t3 (key int, c1 numeric);

CREATE INDEX idx1_data1 ON t3 USING brin 
(c1 numeric_bloom_ops (false_positive_rate = 0.05,
n_distinct_per_range = 100)) ;

select create_distributed_table('t3', 'key');

-- fails
CREATE INDEX idx1_data2 ON t3 USING brin 
(c1 numeric_bloom_ops (false_positive_rate = 0.05,
n_distinct_per_range = 100)) ;
ERROR:  citus currently doesn't support operator class parameters in indexes
Time: 1.177 ms
CREATE TABLE users_table_part(user_id bigint, value_1 int, value_2 int) PARTITION BY RANGE (value_1);
CREATE TABLE users_table_part_0 PARTITION OF users_table_part FOR VALUES FROM (0) TO (1);
CREATE TABLE users_table_part_1 PARTITION OF users_table_part FOR VALUES FROM (1) TO (2);
SELECT create_distributed_table('users_table_part', 'user_id');

alter table users_table_part detach partition users_table_part_0 concurrently; 
ERROR:  ALTER TABLE ... DETACH CONCURRENTLY cannot run inside a transaction block
CONTEXT:  while executing command on localhost:9702
Time: 10.850 ms

SELECT create_distributed_table('part1', 'c1');

CREATE TABLE part1v1 PARTITION OF part1 FOR VALUES FROM (0) TO (100000) ; CREATE TABLE part1v2 PARTITION OF part1 FOR VALUES FROM (100000) TO (200000) ;

CREATE INDEX idx1_part1 ON part1(c1) ; REINDEX TABLE part1 ;

ERROR: REINDEX TABLE cannot run inside a transaction block CONTEXT: while reindexing partitioned table "public.part1_102942" while executing command on localhost:9701 Time: 12.581 ms


- [x] Recursive CTE deparse problem #5234 
```SQL
 CREATE TABLE graph0(f INT, t INT, label TEXT); 
SELECT create_distributed_table('graph0', 'f');

INSERT INTO graph0 VALUES (1, 2, 'arc 1 -> 2'),
 (1, 3, 'arc 1 -> 3'), (2, 3, 'arc 2 -> 3'),
 (1, 4, 'arc 1 -> 4'), (4, 5, 'arc 4 -> 5') ;

 WITH RECURSIVE search_graph(f, t, label) AS (
SELECT * FROM graph0 g WHERE f = 1
UNION ALL
SELECT g.* FROM graph0 g, search_graph sg WHERE 
g.f = sg.t and  g.f = 1
) SEARCH DEPTH FIRST BY f, t SET seq
SELECT * FROM search_graph ORDER BY seq ;

ERROR:  column reference "seq" is ambiguous
CONTEXT:  while executing command on localhost:9701

WITH RECURSIVE search_graph(f, t, label) AS (
SELECT * FROM graph0 g WHERE f = 1
UNION ALL
SELECT g.* FROM graph0 g, search_graph sg WHERE g.f = sg.t and g.f = 1
) SEARCH BREADTH FIRST BY f, t SET seq
SELECT * FROM search_graph ORDER BY seq ;
ERROR:  column reference "seq" is ambiguous
CONTEXT:  while executing command on localhost:9701
Time: 19.813 ms

CREATE TABLE test (a int); SELECT create_distributed_table('test', 'a');

create or replace procedure proc_pushdown(dist_key integer, OUT created int4[]) language plpgsql
as $$ DECLARE res INT := 0; begin INSERT INTO test VALUES (dist_key); SELECT count(*) INTO res FROM test; RAISE NOTICE 'Res: %', res; created := created || res; commit; end;$$;

SELECT create_distributed_function('proc_pushdown(integer)', 'dist_key', 'test' );

-- make sure that metadata is synced, it may take few seconds SELECT bool_and(hasmetadata) FROM pg_dist_node;

-- pushdown the procedure to the node which does a multi-shard query CALL proc_pushdown(1, '{}'::int4[]); ... DEBUG: pushing down the procedure DEBUG: opening 1 new connections to localhost:9701 DEBUG: established connection to localhost:9701 for session 292 in 6415 microseconds NOTICE: issuing CALL public.proc_pushdown(1) DETAIL: on server onderkalaci@localhost:9701 connectionId: 188 ERROR: procedure public.proc_pushdown(integer) does not exist HINT: No procedure matches the given name and argument types. You might need to add explicit type casts. CONTEXT: while executing command on localhost:9701 ...

create or replace procedure proc_regular(dist_key integer, OUT created int4[]) language plpgsql
as $$ DECLARE res INT := 0; begin INSERT INTO test VALUES (dist_key); SELECT count(*) INTO res FROM test; RAISE NOTICE 'Res: %', res; created := created || res; commit; end;$$;

CALL proc_regular(1, '{}'::int4[]); created

{1} (1 row)


- [ ]  Add several regression tests (incomplete list)
    - [x] jsonb subscripting: #5232 
    - [x]  join alias 
```SQL
CREATE TABLE data1(c1 int, c2 int);
CREATE TABLE data2(c1 int, c2 int);
SELECT create_distributed_table('data1', 'c1');
SELECT create_distributed_table('data2', 'c1');

SELECT d1.c1, x.c1, x.c2 FROM data1 d1 JOIN data2 d2 
USING (c1, c2) AS x ;

-- router DISTINCT rollup
select c1, c2
FROM
data2
WHERE c1=1
group by distinct rollup(c1, c2), rollup(c1, c3);

-- multi row INSERTs with DEFAULT used to fail
-- see https://github.com/postgres/postgres/commit/17958972fe3bb03454a4b53756b29d65dc285efa
CREATE TABLE gtest1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
select create_distributed_table('gtest1', 'a');
INSERT INTO gtest1 VALUES (4, DEFAULT), (5, DEFAULT);

CREATE TABLE gtest1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED);
select create_distributed_table('gtest1', 'a');
INSERT INTO gtest1 VALUES (4, DEFAULT), (5, DEFAULT);

-- Allow table-qualified variable names in ON CONFLICT ... WHERE.
-- https://github.com/postgres/postgres/commit/6c0373ab77359c94b279c4e67c91aa623841af65
create table insertconflicttest(key int4 PRIMARY KEY, fruit text);
SELECT create_distributed_table('insertconflicttest', 'key');
insert into insertconflicttest as t values (23, 'Blackberry') on conflict (key) where fruit like '%berry' and t.fruit = 'inconsequential' do nothing;

-- Add support for infinity and -infinity values to the numeric data type (Tom Lane)
create table t1(a int, b numeric);
insert into t1 values (1, 'infinity');
select create_distributed_table('t1', 'b');
 insert into t1 values (1, 'infinity') RETURNING *;
select count(*), b from t1 group by b1;
hanefi commented 3 years ago

I am going through the differences in PG14 grammar, and this is a comment is a WIP. I want us to go through this exhaustive list and make sure we are ok with all the changes.

Please do not update the checklist, as I will constantly add some more items to the list, and I do not want to overwrite any of your changes.

New nodes

Removed Stmt s

New fields

New enum values

Renames

New operators to existing Stmt s

Need to revisit these

ozgune commented 2 years ago

Hi @onderkalaci , @hanefi - Did we complete this work? If so, should we close this issue?

hanefi commented 2 years ago

This was completed in #5209