Closed yazun closed 3 years ago
I run your code on our enviroument which works fine. Can you provide the definition of dr3_ops_cs36_tmp.getmaincatalog_tst and the query explained?
relation 523409537 deleted while still in use
Is dr3_ops_cs36_tmp.getmaincatalog_tst using a table that dropped during execution phase?
To avoid the problem,you can try split create and select , such as
create table a ( a int); insert into a select generate_series(1,1000);
I run your code on our enviroument which works fine. Can you provide the definition of dr3_ops_cs36_tmp.getmaincatalog_tst and the query explained?
Yes, The query also work fine without create table as ..
this is why it's puzzling.
The function is using a table like r
in the above example, but the table is much older - this would be the main difference I guess.
create table mv_r_s_1 as
surveys-# select s.*,r.*, getr_text(r.i), getr_text(s.id) d2 from
surveys-# r right outer join s on (getr_text(r.i) = s.val or dr3_ops_cs36_tmp.getmaincatalog_tst() = i % 4)
surveys-# ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..715488.47 rows=34139 width=136)
Join Filter: ((getr_text(r.i) = s.val) OR ('2'::smallint = (r.i % 4)))
-> Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9) (cost=100.00..276.65 rows=1850 width=36)
-> Seq Scan on s (cost=0.00..104.50 rows=1850 width=36)
-> Materialize (cost=100.00..282.20 rows=1850 width=36)
-> Remote Subquery Scan on all (datanode1) (cost=100.00..276.65 rows=1850 width=36)
-> Seq Scan on r (cost=0.00..104.50 rows=1850 width=36)
(7 rows)
the function causing problems is as simple as
CREATE OR REPLACE FUNCTION dr3_ops_cs36_tmp.getmaincatalog_tst()
RETURNS smallint
LANGUAGE sql
immutable PARALLEL SAFE
AS $function$ select catalogid from catalog where fowningcatalog and size > 10^7; $function$
;
as long as there's reference to catalog
table in this function, it breaks. It happens for other old tables...
explain select catalogid from catalog where fowningcatalog and size > 10^7;
QUERY PLAN
------------------------------------------------------------------------------------------------
Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0)
Node/s: datanode1
-> Seq Scan on catalog (cost=0.00..13.93 rows=1 width=2)
Filter: (fowningcatalog AND ((size)::double precision > '10000000'::double precision))
(4 rows)
The catalog
table is there untouched. It's a static table, replicated. Query without create as ...
works ok.
Unfortunately it is impossible for us to create the table by hand as we do not know the table definition: this technique is used to bootstrap a table by scientists with various queries, doing joins on huge partitioned tables and we use heavy parallel querying to populate it.
There's a function in psql from PG11 to get the definition of query buffer, but not sure if we could just patch it https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=49ca462eb165dea297f1f110e8eac064308e9d51#patch2 what do you think? We could then parse it to create a table as you suggest.
Of course, fixing the core problem would be the best..
Thanks a lot for help!
btw. vacuum freeze of catalog
table did not help.
I run your code on our enviroument which works fine. Can you provide the definition of dr3_ops_cs36_tmp.getmaincatalog_tst and the query explained?
Yes, The query also work fine without
create table as ..
this is why it's puzzling. The function is using a table liker
in the above example, but the table is much older - this would be the main difference I guess.create table mv_r_s_1 as surveys-# select s.*,r.*, getr_text(r.i), getr_text(s.id) d2 from surveys-# r right outer join s on (getr_text(r.i) = s.val or dr3_ops_cs36_tmp.getmaincatalog_tst() = i % 4) surveys-# ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=0.00..715488.47 rows=34139 width=136) Join Filter: ((getr_text(r.i) = s.val) OR ('2'::smallint = (r.i % 4))) -> Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9) (cost=100.00..276.65 rows=1850 width=36) -> Seq Scan on s (cost=0.00..104.50 rows=1850 width=36) -> Materialize (cost=100.00..282.20 rows=1850 width=36) -> Remote Subquery Scan on all (datanode1) (cost=100.00..276.65 rows=1850 width=36) -> Seq Scan on r (cost=0.00..104.50 rows=1850 width=36) (7 rows)
the function causing problems is as simple as
CREATE OR REPLACE FUNCTION dr3_ops_cs36_tmp.getmaincatalog_tst() RETURNS smallint LANGUAGE sql immutable PARALLEL SAFE AS $function$ select catalogid from catalog where fowningcatalog and size > 10^7; $function$ ;
as long as there's reference to
catalog
table in this function, it breaks. It happens for other old tables...explain select catalogid from catalog where fowningcatalog and size > 10^7; QUERY PLAN ------------------------------------------------------------------------------------------------ Remote Fast Query Execution (cost=0.00..0.00 rows=0 width=0) Node/s: datanode1 -> Seq Scan on catalog (cost=0.00..13.93 rows=1 width=2) Filter: (fowningcatalog AND ((size)::double precision > '10000000'::double precision)) (4 rows)
The
catalog
table is there untouched. It's a static table, replicated. Query withoutcreate as ...
works ok.Unfortunately it is impossible for us to create the table by hand as we do not know the table definition: this technique is used to bootstrap a table by scientists with various queries, doing joins on huge partitioned tables and we use heavy parallel querying to populate it.
There's a function in psql from PG11 to get the definition of query buffer, but not sure if we could just patch it https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=49ca462eb165dea297f1f110e8eac064308e9d51#patch2 what do you think? We could then parse it to create a table as you suggest.
Of course, fixing the core problem would be the best..
Thanks a lot for help!
Thanks for your detailed instruction. The issue occured in our database , We will try to solve it as soon as possible.
Btw, When your database will go online ? I wonder we have enough time to solve it.
Good news it is repeatable on your side!! Thanks a lot!
In principle we are ready to switch at any moment, we have two systems running in parallel (XL and TBase). We are to start the final test to make a stress test over this weekend and if successful it would be end of the next week.
For now, for this particular problem we patched psql with the above-mentioned \gdesc switch and finishing amending scripts, so hopefully we have a dirty workaround soon..
Good News,Problem solved,it's on master branch.
Much appreciated! Will test asap and will close the issue once validated.
Confirming it works ok! perfect!
Continuing #82
We must rely on
in an important scenario.
There is regression in comparison to XL which causes some immutable lookup functions to cause the error:
Ie. this works:
The function is very similar to
getr_text
- except is run on an older, replicated table.You can see immutable function query to be shipped to datanode in the log at DEBUG5 level attached here.
bug_xz_create_as.txt