Closed idontgetoutmuch closed 7 months ago
The first thing you should do is run both of these queries with EXPLAIN ANALYZE
, that will at least remove the guess work as to where something is slow. Can you do that and post both results here?
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Finalize GroupAggregate (cost=349100.71..349673.98 rows=780 width=36) (actual time=1547.967..1695.759 rows=822 loops=1)
Group Key: wp.c_year, wp.c_month
-> Gather Merge (cost=349100.71..349648.63 rows=1560 width=36) (actual time=1547.963..1695.119 rows=2406 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=348100.69..348468.55 rows=780 width=36) (actual time=1526.307..1671.224 rows=802 loops=3)
Group Key: wp.c_year, wp.c_month
-> Sort (cost=348100.69..348160.37 rows=23874 width=12) (actual time=1526.291..1565.306 rows=829621 loops=3)
Sort Key: wp.c_year, wp.c_month
Sort Method: external merge Disk: 19128kB
Worker 0: Sort Method: external merge Disk: 18432kB
Worker 1: Sort Method: external merge Disk: 18848kB
-> Merge Join (cost=317412.33..346364.67 rows=23874 width=12) (actual time=1215.947..1423.656 rows=829621 loops=3)
Merge Cond: (((((wp.c_year)::integer)::double precision) = (date_part('year'::text, (dp.d_date)::timestamp without time zone))) AND ((((wp.c_month)::integer)::double precision) = (date_part('month'::text, (dp.d_date)::timestamp without time zone))))
-> Sort (cost=317347.78..319898.42 rows=1020257 width=10) (actual time=1215.691..1306.276 rows=829621 loops=3)
Sort Key: (((wp.c_year)::integer)::double precision), (((wp.c_month)::integer)::double precision)
Sort Method: external merge Disk: 33456kB
Worker 0: Sort Method: external merge Disk: 32280kB
Worker 1: Sort Method: external merge Disk: 33016kB
-> Parallel Seq Scan on production_widget wp (cost=0.00..198086.57 rows=1020257 width=10) (actual time=0.926..931.584 rows=829621 loops=3)
-> Sort (cost=64.55..66.89 rows=936 width=6) (actual time=0.249..16.810 rows=829701 loops=3)
Sort Key: (date_part('year'::text, (dp.d_date)::timestamp without time zone)), (date_part('month'::text, (dp.d_date)::timestamp without time zone))
Sort Method: quicksort Memory: 98kB
Worker 0: Sort Method: quicksort Memory: 98kB
Worker 1: Sort Method: quicksort Memory: 98kB
-> Seq Scan on ref_days_in_month dp (cost=0.00..18.36 rows=936 width=6) (actual time=0.045..0.185 rows=936 loops=3)
Planning Time: 0.672 ms
Execution Time: 1700.888 ms
(28 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan on t1 (cost=95328223.56..95328521.06 rows=703 width=96) (actual time=191257.543..191459.605 rows=822 loops=1)
-> Finalize GroupAggregate (cost=95328223.56..95328514.03 rows=703 width=72) (actual time=191257.538..191459.545 rows=822 loops=1)
Group Key: t1_1.p_period
-> Gather Merge (cost=95328223.56..95328485.91 rows=1406 width=72) (actual time=191257.520..191458.402 rows=2420 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=95327223.53..95327323.60 rows=703 width=72) (actual time=191249.306..191446.884 rows=807 loops=3)
Group Key: t1_1.p_period
-> Sort (cost=95327223.53..95327238.45 rows=5968 width=16) (actual time=191249.280..191318.064 rows=829621 loops=3)
Sort Key: t1_1.p_period
Sort Method: external merge Disk: 22048kB
Worker 0: Sort Method: external merge Disk: 22072kB
Worker 1: Sort Method: external merge Disk: 22056kB
-> Nested Loop (cost=0.00..95326849.25 rows=5968 width=16) (actual time=0.791..191001.585 rows=829621 loops=3)
Join Filter: ((((t1_1.c_year IS NULL) AND ((trunc(date_part('year'::text, (t1_2.d_date)::timestamp with time zone)))::smallint IS NULL)) OR ((t1_1.c_year = (trunc(date_part('year'::text, (t1_2.d_date)::timestamp with time zone)))::smallint) AND COALESCE((t1_1.c_year = (trunc(date_part('year'::text, (t1_2.d_date)::timestamp with time zone)))::smallint), false))) AND (((t1_1.c_month IS NULL) AND ((trunc(date_part('month'::text, (t1_2.d_date)::timestamp with time zone)))::smallint IS NULL)) OR ((t1_1.c_month = (trunc(date_part('month'::text, (t1_2.d_date)::timestamp with time zone)))::smallint) AND COALESCE((t1_1.c_month = (trunc(date_part('month'::text, (t1_2.d_date)::timestamp with time zone)))::smallint), false))))
Rows Removed by Join Filter: 775695635
-> Parallel Seq Scan on production_widget t1_1 (cost=0.00..198086.57 rows=1020257 width=18) (actual time=0.235..927.992 rows=829621 loops=3)
-> Seq Scan on ref_days_in_month t1_2 (cost=0.00..18.36 rows=936 width=6) (actual time=0.002..0.028 rows=936 loops=2488863)
Planning Time: 0.808 ms
Execution Time: 191461.269 ms
(20 rows)
I've just noticed that they are not quite equivalent:
GROUP BY
wp.c_year, wp.c_month;
whereas
Rel8.groupBy (pPeriod widgetsPerDay)
@ocharles what do I say in Rel8 to get the equivalent of GROUP BY wp.c_year, wp.c_month
? I can see how to group by 1 variable but how about 2 or more variables?
You can just pass a tuple of Expr
s to groupBy
. Perhaps:
groupBy (pPeriodYear widgetsPerDay, pPeriodMonth widgetsPerDay)
or maybe:
groupBy (year (pPeriod widgetsPerDay), month (pPeriod widgetsPerDay))
(but I don't know what pPeriod
is)
This gives
Finalize GroupAggregate (cost=95328223.56..95328528.24 rows=780 width=36) (actual time=188518.370..188686.717 rows=822 loops=1)
Group Key: t1.c_year, t1.c_month
-> Gather Merge (cost=95328223.56..95328502.89 rows=1560 width=36) (actual time=188518.363..188686.023 rows=2419 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=95327223.53..95327322.80 rows=780 width=36) (actual time=188511.896..188678.505 rows=806 loops=3)
Group Key: t1.c_year, t1.c_month
-> Sort (cost=95327223.53..95327238.45 rows=5968 width=12) (actual time=188511.880..188572.827 rows=829621 loops=3)
Sort Key: t1.c_year, t1.c_month
Sort Method: external merge Disk: 18816kB
Worker 0: Sort Method: external merge Disk: 18792kB
Worker 1: Sort Method: external merge Disk: 18800kB
-> Nested Loop (cost=0.00..95326849.25 rows=5968 width=12) (actual time=4.141..188355.572 rows=829621 loops=3)
Join Filter: ((((t1.c_year IS NULL) AND ((trunc(date_part('year'::text, (t1_1.d_date)::timestamp with time zone)))::smallint IS NULL)) OR
((t1.c_year = (trunc(date_part('year'::text, (t1_1.d_date)::timestamp with time zone)))::smallint) AND
COALESCE((t1.c_year = (trunc(date_part('year'::text, (t1_1.d_date)::timestamp with time zone)))::smallint), false))) AND
(((t1.c_month IS NULL) AND ((trunc(date_part('month'::text, (t1_1.d_date)::timestamp with time zone)))::smallint IS NULL)) OR
((t1.c_month = (trunc(date_part('month'::text, (t1_1.d_date)::timestamp with time zone)))::smallint) AND
COALESCE((t1.c_month = (trunc(date_part('month'::text, (t1_1.d_date)::timestamp with time zone)))::smallint), false))))
Rows Removed by Join Filter: 775695635
-> Parallel Seq Scan on production_brazil t1 (cost=0.00..198086.57 rows=1020257 width=10) (actual time=0.623..571.236 rows=829621 loops=3)
-> Seq Scan on ref_days_in_month t1_1 (cost=0.00..18.36 rows=936 width=6) (actual time=0.002..0.028 rows=936 loops=2488863)
Planning Time: 0.985 ms
Execution Time: 188687.476 ms
(19 rows)
versus the hand-written
Finalize GroupAggregate (cost=349100.71..349673.98 rows=780 width=36) (actual time=1547.967..1695.759 rows=822 loops=1)
Group Key: wp.c_year, wp.c_month
-> Gather Merge (cost=349100.71..349648.63 rows=1560 width=36) (actual time=1547.963..1695.119 rows=2406 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=348100.69..348468.55 rows=780 width=36) (actual time=1526.307..1671.224 rows=802 loops=3)
Group Key: wp.c_year, wp.c_month
-> Sort (cost=348100.69..348160.37 rows=23874 width=12) (actual time=1526.291..1565.306 rows=829621 loops=3)
Sort Key: wp.c_year, wp.c_month
Sort Method: external merge Disk: 19128kB
Worker 0: Sort Method: external merge Disk: 18432kB
Worker 1: Sort Method: external merge Disk: 18848kB
-> Merge Join (cost=317412.33..346364.67 rows=23874 width=12) (actual time=1215.947..1423.656 rows=829621 loops=3)
Merge Cond: (((((wp.c_year)::integer)::double precision) = (date_part('year'::text, (dp.d_date)::timestamp without time zone))) AND
((((wp.c_month)::integer)::double precision) = (date_part('month'::text, (dp.d_date)::timestamp without time zone))))
-> Sort (cost=317347.78..319898.42 rows=1020257 width=10) (actual time=1215.691..1306.276 rows=829621 loops=3)
Sort Key: (((wp.c_year)::integer)::double precision), (((wp.c_month)::integer)::double precision)
Sort Method: external merge Disk: 33456kB
Worker 0: Sort Method: external merge Disk: 32280kB
Worker 1: Sort Method: external merge Disk: 33016kB
-> Parallel Seq Scan on production_widget wp (cost=0.00..198086.57 rows=1020257 width=10) (actual time=0.926..931.584 rows=829621 loops=3)
-> Sort (cost=64.55..66.89 rows=936 width=6) (actual time=0.249..16.810 rows=829701 loops=3)
Sort Key: (date_part('year'::text, (dp.d_date)::timestamp without time zone)), (date_part('month'::text, (dp.d_date)::timestamp without time zone))
Sort Method: quicksort Memory: 98kB
Worker 0: Sort Method: quicksort Memory: 98kB
Worker 1: Sort Method: quicksort Memory: 98kB
-> Seq Scan on ref_days_in_month dp (cost=0.00..18.36 rows=936 width=6) (actual time=0.045..0.185 rows=936 loops=3)
Planning Time: 0.672 ms
Execution Time: 1700.888 ms
(28 rows)
Let me know if you need anything else. I am assuming that this is generated by Opaleye?
I would imagine it's the join condition that's causing the problem. Your Rel8 one has a lot more going on with null
checks. If you can get the join condition to match your hand written SQL, you may well get the same performance.
I am using https://hackage.haskell.org/package/rel8-1.4.1.0 and have the following query in Haskell. Sadly it takes 180 seconds to run.
When I code up the SQL by hand it takes 3 seconds.
Here is what
showQuery
gives me:It looks very slow because there’s a query inside a query inside another query: 5 sub queries? But I am an SQL noob. Maybe I mis-formulated the Haskell?