citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.57k stars 670 forks source link

Wrong estimates for intermediate results in recursive planning #4826

Open saicitus opened 3 years ago

saicitus commented 3 years ago

Citus 10. Steps to reproduce:

CREATE TABLE test(id int);
SELECT create_distributed_table('test','id');
explain WITH a as (SELECT DATE('2019-10-31'), DATERANGE('[2019-01-01,2019-10-31]') 
UNION SELECT DATE('2020-10-31'), DATERANGE('[2020-01-01,2020-10-31]')) 
SELECT * from a CROSS JOIN test;
**Explain output**

                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Custom Scan (Citus Adaptive)  (cost=0.00..0.00 rows=100000 width=40)
   ->  Distributed Subplan 6_1
         ->  Unique  (cost=0.06..0.08 rows=2 width=36)
               ->  Sort  (cost=0.06..0.07 rows=2 width=36)
                     Sort Key: ('2019-10-31'::date), ('[2019-01-01,2019-11-01)'::daterange)
                     ->  Append  (cost=0.00..0.05 rows=2 width=36)
                           ->  Result  (cost=0.00..0.01 rows=1 width=36)
                           ->  Result  (cost=0.00..0.01 rows=1 width=36)
   Task Count: 32
   Tasks Shown: One of 32
   ->  Task
         Node: host=w0.xxxxx port=5432 dbname=citus
         ->  Nested Loop  (cost=0.00..31926.88 rows=2550000 width=40)
               ->  Function Scan on read_intermediate_result intermediate_result  (cost=0.00..10.00 rows=1000 width=36)
               ->  Materialize  (cost=0.00..48.25 rows=2550 width=4)
                     ->  Seq Scan on test_103741 test  (cost=0.00..35.50 rows=2550 width=4)
(16 rows)

Distributed Subplan 6_1 predicts 2 rows but Function Scan on read_intermediate_result intermediate_result predicts 1000 rows.

This discrepancy can generate order of magnitude (>1000 times) wrong estimates. Have observed this with a customer workload which resulted in a wrong plan (GroupAgg vs HashAgg) which resulted in order of magnitude performance deterioration (40 mins vs 4 mins)

marcocitus commented 3 years ago

Just a note: explain without analyze is likely to be inaccurate w.r.t. the actual plan, since the size of intermediate results is only taken into account at runtime. Nonetheless, we do see cases where function call joins give poor plans even with accurate counts.