citusdata / cstore_fdw

Columnar storage extension for Postgres built as a foreign data wrapper. Check out https://github.com/citusdata/citus for a modernized columnar storage implementation built as a table access method.
Apache License 2.0
1.76k stars 171 forks source link

Get different query plan of the same query when using cstore_fdw? #106

Open Layamon opened 8 years ago

Layamon commented 8 years ago

I use tpc-ds benchmark to test the cstore_fdw, some Query like Query 32 have something wrong , execution time with cstore_fdw is 1078208.133 ms while normal pg table is 44350.811 ms.
using EXPLAIN ANALYZE , I found that using cstore foreign table the loops of subquery is 2401 while normal pg table is 52. It is because the position of Join filter of catalog_sales is different. How do I optimize it?
there are two query plan:

using cstore :

QUERY PLAN

Limit (cost=15359897632.00..15359897632.01 rows=1 width=14) (actual time=1078202.775..1078202.776 rows=1 loops=1) -> Aggregate (cost=15359897632.00..15359897632.01 rows=1 width=14) (actual time=1078202.771..1078202.771 rows=1 loops=1) -> Hash Join (cost=1556.15..15359896645.44 rows=394623 width=14) (actual time=22829.825..1078202.700 rows=15 loops=1) Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim.d_date_sk) -> Hash Join (cost=338.85..15359881210.89 rows=216232 width=18) (actual time=1702.229..1078198.161 rows=329 loops=1) Hash Cond: (catalog_sales.cs_item_sk = item.i_item_sk) Join Filter: (catalog_sales.cs_ext_discount_amt > (SubPlan 1)) Rows Removed by Join Filter: 2072 -> Foreign Scan on catalog_sales (cost=0.00..18349.98 rows=1441548 width=22) (actual time=4.626..427.811 rows=1441548 loops=1) CStore File: /home/liuyangming/postgresql-9.5.3/build/data/cstore_fdw/24929/25076 CStore File Size: 365288698 -> Hash (cost=337.73..337.73 rows=90 width=4) (actual time=10.845..10.845 rows=36 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Foreign Scan on item (cost=0.00..337.73 rows=90 width=4) (actual time=1.839..10.804 rows=36 loops=1) Filter: (i_manufact_id = 269) Rows Removed by Filter: 17964 CStore File: /home/liuyangming/postgresql-9.5.3/build/data/cstore_fdw/24929/25034 CStore File Size: 10155737 SubPlan 1 -> Aggregate (cost=23677.97..23677.98 rows=1 width=14) (actual time=448.753..448.753 rows=1 loops=2401) -> Hash Join (cost=1217.30..23645.08 rows=13155 width=14) (actual time=233.081..448.723 rows=2 loops=2401) Hash Cond: (catalog_sales_1.cs_sold_date_sk = date_dim_1.d_date_sk) -> Foreign Scan on catalog_sales catalog_sales_1 (cost=0.00..21953.85 rows=7208 width=18) (actual time=125.990..448.174 rows=83 loops=2401) Filter: (cs_item_sk = item.i_item_sk) Rows Removed by Filter: 1441465 CStore File: /home/liuyangming/postgresql-9.5.3/build/data/cstore_fdw/24929/25076 CStore File Size: 365288698 -> Hash (cost=1212.73..1212.73 rows=365 width=4) (actual time=52.065..52.065 rows=91 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> Foreign Scan on date_dim date_dim_1 (cost=0.00..1212.73 rows=365 width=4) (actual time=22.227..51.996 rows=91 loops=1) Filter: ((d_date >= '1998-03-18'::date) AND (d_date <= '1998-06-16 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 72958 CStore File: /home/liuyangming/postgresql-9.5.3/build/data/cstore_fdw/24929/25016 CStore File Size: 13412711 -> Hash (cost=1212.73..1212.73 rows=365 width=4) (actual time=3.858..3.858 rows=91 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> Foreign Scan on date_dim (cost=0.00..1212.73 rows=365 width=4) (actual time=2.337..3.820 rows=91 loops=1) Filter: ((d_date >= '1998-03-18'::date) AND (d_date <= '1998-06-16 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 9909 CStore File: /home/liuyangming/postgresql-9.5.3/build/data/cstore_fdw/24929/25016 CStore File Size: 13412711 Planning time: 4.003 ms Execution time: 1078208.133 ms

not using cstore:__

QUERY PLAN

Limit (cost=177293.21..177293.22 rows=1 width=6) (actual time=44350.348..44350.350 rows=1 loops=1) -> Aggregate (cost=177293.21..177293.22 rows=1 width=6) (actual time=44350.340..44350.341 rows=1 loops=1) -> Hash Join (cost=3957.06..177293.21 rows=1 width=6) (actual time=2029.455..44350.255 rows=15 loops=1) Hash Cond: (catalog_sales.cs_item_sk = item.i_item_sk) Join Filter: (catalog_sales.cs_ext_discount_amt > (SubPlan 1)) Rows Removed by Join Filter: 37 -> Hash Join (cost=2501.86..59745.81 rows=1767 width=10) (actual time=64.372..772.172 rows=41358 loops=1) Hash Cond: (catalog_sales.cs_sold_date_sk = date_dim.d_date_sk) -> Seq Scan on catalog_sales (cost=0.00..51820.48 rows=1441548 width=14) (actual time=0.009..388.408 rows=1441548 loops=1) -> Hash (cost=2500.73..2500.73 rows=90 width=4) (actual time=54.826..54.826 rows=91 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 12kB -> Seq Scan on date_dim (cost=0.00..2500.73 rows=90 width=4) (actual time=22.129..54.742 rows=91 loops=1) Filter: ((d_date >= '1998-03-18'::date) AND (d_date <= '1998-06-16 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 72958 -> Hash (cost=1455.00..1455.00 rows=16 width=4) (actual time=28.360..28.360 rows=36 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Seq Scan on item (cost=0.00..1455.00 rows=16 width=4) (actual time=0.967..28.285 rows=36 loops=1) Filter: (i_manufact_id = 269) Rows Removed by Filter: 17964 SubPlan 1 -> Aggregate (cost=58042.76..58042.77 rows=1 width=6) (actual time=837.318..837.318 rows=1 loops=52) -> Nested Loop (cost=0.00..58042.75 rows=1 width=6) (actual time=818.417..837.266 rows=6 loops=52) Join Filter: (catalog_sales_1.cs_sold_date_sk = date_dim_1.d_date_sk) Rows Removed by Join Filter: 8876 -> Seq Scan on date_dim date_dim_1 (cost=0.00..2500.73 rows=90 width=4) (actual time=11.627..28.174 rows=91 loops=52) Filter: ((d_date >= '1998-03-18'::date) AND (d_date <= '1998-06-16 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 72958 -> Materialize (cost=0.00..55424.79 rows=87 width=10) (actual time=0.039..8.869 rows=98 loops=4732) -> Seq Scan on catalog_sales catalog_sales_1 (cost=0.00..55424.35 rows=87 width=10) (actual time=3.529..805.973 rows=98 loops=52) Filter: (cs_item_sk = item.i_item_sk) Rows Removed by Filter: 1441450 Planning time: 2.030 ms Execution time: 44350.811 ms

mtuncer commented 8 years ago

can you run analyze on cstore_fdw table and repeat the test ? I see you are using postgresql 9.3, what is the cstore_fdw version ?

Layamon commented 8 years ago

using postgresql-9.5.3, cstore_fdw is 1.4 , I already run analyze before test.