apache / cloudberry

One advanced and mature open-source MPP (Massively Parallel Processing) database. Open source alternative to Greenplum Database.
https://cloudberry.apache.org
Apache License 2.0
417 stars 104 forks source link

Cherry pick Rows out in EXPLAIN ANALYZE #670

Open robozmey opened 1 month ago

robozmey commented 1 month ago

Cherry pick https://github.com/yezzey-gp/ygp/commit/1d4123048fbf5482c636eb193876d517edaaae81

Change logs

Add "Rows out" print in cdbexplain_showExecStats

set gp_enable_explain_rows_out=on;

drop table if exists tt; create table tt (a int, b int) distributed randomly;

explain (analyze,verbose) insert into tt select * from generate_series(1,1000)a,generate_series(1,1000)b;

 Insert  (cost=0.00..495560.34 rows=333334 width=8) (actual time=3.829..1148.458 rows=333741 loops=1)
   Output: generate_series_1.generate_series, generate_series.generate_series, "outer".ColRef_0002, generate_series_1.generate_series
   Executor Memory: 1kB  Segments: 3  Max: 1kB (segment 0)
   Rows out: 333333.33 rows avg x 3 workers, 333741 rows max (seg1), 333054 rows min (seg2).
   ->  Redistribute Motion 3:3  (slice1; segments: 3)  (cost=0.00..479935.34 rows=333334 width=12) (actual time=3.707..761.241 rows=333741 loops=1)
         Output: generate_series_1.generate_series, generate_series.generate_series, "outer".ColRef_0002
         Rows out: 333333.33 rows avg x 3 workers, 333741 rows max (seg1), 333054 rows min (seg2).
         ->  Result  (cost=0.00..479922.82 rows=333334 width=12) (actual time=0.226..882.580 rows=1000000 loops=1)
               Output: generate_series_1.generate_series, generate_series.generate_series, 1
               Rows out: 333333.33 rows avg x 3 workers, 1000000 rows max (seg1), 0 rows min (seg0).
               ->  Result  (cost=0.00..479918.82 rows=333334 width=8) (actual time=0.225..631.202 rows=1000000 loops=1)
                     Output: generate_series_1.generate_series, generate_series.generate_series
                     One-Time Filter: (gp_execution_segment() = 1)
                     Rows out: 333333.33 rows avg x 3 workers, 1000000 rows max (seg1), 0 rows min (seg0).
                     ->  Nested Loop  (cost=0.00..479898.05 rows=333334 width=8) (actual time=0.220..386.554 rows=1000000 loops=1)
                           Output: generate_series_1.generate_series, generate_series.generate_series
                           Join Filter: true
                           Rows out: 333333.33 rows avg x 3 workers, 1000000 rows max (seg1), 0 rows min (seg0).
                           ->  Function Scan on pg_catalog.generate_series generate_series_1  (cost=0.00..0.00 rows=334 width=4) (actual time=0.102..0.333 rows=1000 loops=1)
                                 Output: generate_series_1.generate_series
                                 Function Call: generate_series(1, 1000)
                                 work_mem: 40kB  Segments: 1  Max: 40kB (segment 1)
                                 Rows out: 333.33 rows avg x 3 workers, 1000 rows max (seg1), 0 rows min (seg0).
                           ->  Function Scan on pg_catalog.generate_series  (cost=0.00..0.00 rows=334 width=4) (actual time=0.000..0.092 rows=999 loops=1001)
                                 Output: generate_series.generate_series
                                 Function Call: generate_series(1, 1000)
                                 work_mem: 40kB  Segments: 1  Max: 40kB (segment 1)
                                 Rows out: 333333.67 rows avg x 3 workers, 1000001 rows max (seg1), 0 rows min (seg0).
 Planning time: 5.981 ms
   (slice0)    Executor memory: 87K bytes avg x 3 workers, 87K bytes max (seg0).
   (slice1)    Executor memory: 97K bytes avg x 3 workers, 172K bytes max (seg1).  Work_mem: 40K bytes max.
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 1180.349 ms
(34 rows)
explain (analyze,verbose,format text) select * from tt where a > b;

                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.00 rows=1 width=8) (actual time=2.534..223.679 rows=499500 loops=1)
   Output: a, b
   ->  Seq Scan on public.tt  (cost=0.00..431.00 rows=1 width=8) (actual time=0.145..127.350 rows=166742 loops=1)
         Output: a, b
         Filter: (tt.a > tt.b)
         Rows out: 166500.00 rows avg x 3 workers, 166742 rows max (seg0), 166340 rows min (seg2).
 Planning time: 2.578 ms
   (slice0)    Executor memory: 183K bytes.
   (slice1)    Executor memory: 43K bytes avg x 3 workers, 43K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 245.018 ms
(12 rows)

Why are the changes needed?

"Rows out" is useful for auto explain

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

This feature tests in regression gp_explain test

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

CLAassistant commented 1 month ago

CLA assistant check
All committers have signed the CLA.

robozmey commented 20 hours ago

Do we have a Parallel Plan test(ex: cbdb_parallel.sql) to ensure the output is as expected?

I'm not sure what we need Parallel Plan test, but I expanded gp_explain.sql test