apache / cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
https://cloudberry.apache.org
Apache License 2.0
417 stars 104 forks source link

[Bug] Plan with parameterized path may always be wrong #671

Open jiaqizho opened 1 month ago

jiaqizho commented 1 month ago

Cloudberry Database version

All versions affected

Also GP7 have the same issue.

What happened

create table t1(v varchar(100));
insert into t1 values('abc~001');

create or replace function sameout 
    (in vc varchar) returns varchar AS $BODY$
begin
    return vc;
END;
$BODY$
LANGUAGE plpgsql;

set optimizer to off;
select (select sameout(v) t) from t1;  -- will coredump 

This case has nothing to do with the table itself and the custom function. Only relative the parameterized path.

ORCA also will got the same issue with other sql.

So the SQL select (select sameout(v) t) from t1; plan is

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..3928.00 rows=13200 width=32)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.t1  (cost=0.00..3664.00 rows=13200 width=32)
         Output: (SubPlan 1)
         SubPlan 1
           ->  Materialize  (cost=0.00..0.00 rows=0 width=0)
                 Output: (sameout(t1.v))
                 ->  Broadcast Motion 1:1  (slice2)  (cost=0.00..0.26 rows=1 width=32)
                       Output: (sameout(t1.v))
                       ->  Result  (cost=0.00..0.26 rows=1 width=32)
                             Output: sameout(t1.v)
 Optimizer: Postgres query optimizer
(12 rows)

As u can see we do have a Motion node exist in subplan. Then in Result node won't get any tuple in this path.

I guess this case is caused by optimizer. The optimizer should not create the parameterized path with the motion.

What you think should happen instead

this sql can change to the select sameout(v) from t1; or a subquery select sameout(v) from (select v from t1) t;

The corrent plan should flat the parameterized path or become a subquery

                                      QUERY PLAN
---------------------------------------------------------------------------------------
 Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..3928.00 rows=13200 width=32)
   -> Output: (sameout(t1.v))
        ->  Broadcast Motion 1:1  (slice2)  (cost=0.00..0.26 rows=1 width=32)
            Output: (sameout(t1.v))
            ->  Seq Scan on public.t1  (cost=0.00..3664.00 rows=13200 width=32)

 Optimizer: Postgres query optimizer
(12 rows)

or no motion inside

                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Result  (cost=0.00..882732.09 rows=1 width=8)
   Output: (SubPlan 1)
   ->  Gather Motion 1:1  (slice1; segments: 1)  (cost=0.00..431.00 rows=1 width=8)
         Output: v
         ->  Seq Scan on public.t1  (cost=0.00..431.00 rows=1 width=8)
               Output: v
   SubPlan 1
           ->  Materialize  (cost=0.00..0.00 rows=0 width=0)
                 Output: (sameout(t1.v))
                 ->  Result  (cost=0.00..0.26 rows=1 width=32)
                       Output: sameout(t1.v)
(20 rows)

How to reproduce

above

Operating System

any

Anything else

No response

Are you willing to submit PR?

Code of Conduct

github-actions[bot] commented 1 month ago

Hey, @jiaqizho welcome!🎊 Thanks for taking the time to point this out.🙌