Closed sebbacon closed 7 years ago
Similary, NewRelic shows us that the slowest running query of late is:
SELECT
pr.practice_id AS row_id,
pc.name AS row_name,
pc.setting AS setting,
pc.ccg_id AS ccg,
pr.processing_date AS date,
SUM(pr.cost) AS actual_cost,
SUM(pr.items) AS items,
SUM(pr.quantity) AS quantity
FROM
vw__chemical_summary_by_practice pr
JOIN
frontend_practice pc
ON
pr.practice_id=pc.code
WHERE
(pr.chemical_id=%s )
AND (pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s
OR pr.practice_id=%s )
GROUP BY
pr.practice_id,
pc.code,
date
ORDER BY
date,
pr.practice_id```
On the index creation example. This is a 68GB table with 28GB index.
When /mnt/database is being read (100 - 180 MB/s) its utlisation in iostat is at 100% and IO wait is therefore high.
DROP TABLE
will block a SELECT
, so I don't think this is relevant, unless there is some parallelism I'm missing.
DROP TABLE
creates ACCESS EXCLUSIVE
locks which will block everything elseCREATE INDEX
(non concurrently) creates SHARE
lock which will block VACUUM
, ANALYZE
and CREATE INDEX
(concurrently)COPY INTO
creates ROW EXCLUSIVE
locks which will block CREATE INDEX
Another data point: during the following operation, IO was at 100% (all read ops) and it took forever (i.e. well over 30 mins) - no index, so was doing a full table scan:
DELETE FROM vw__chemical_summary_by_practice WHERE processing_date = '2011-11-01';
Responding to a specific query from @richiecroker.
Seems that the following query is timing out:
/api/1.0/spending_by_ccg/?format=json&code=0408010AEBE,0408010AEBF,0408010AEBD,0408010AEBC
The pathological behaviour (resorting to a full table scan) appears related to the GROUP BY date
clause.
This results in a sequence scan of the table:
EXPLAIN SELECT
pr.pct_id AS pct_id,
pr.processing_date AS date
FROM
vw__presentation_summary_by_ccg pr
WHERE
(pr.presentation_code LIKE '0408010AEBE%' OR
pr.presentation_code LIKE '0408010AEBF%' OR
pr.presentation_code LIKE '0408010AEBF%'
)
GROUP BY
pr.pct_id,
date
LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2555605.18..2555605.19 rows=1 width=8)
-> Group (cost=2555605.18..2555758.32 rows=14741 width=8)
Group Key: pct_id, processing_date
-> Sort (cost=2555605.18..2555656.23 rows=20419 width=8)
Sort Key: pct_id, processing_date
-> Seq Scan on vw__presentation_summary_by_ccg pr (cost=0.00..2554143.42 rows=20419 width=8)
Filter: (((presentation_code)::text ~~ '0408010AEBE%'::text) OR ((presentation_code)::text ~~ '0408010AEBF%'::text) OR ((presentation_code)::text ~~ '0408010AEBF%'::text))
But if we remove the GROUP BY ... date
it uses an index scan:
EXPLAIN SELECT
pr.pct_id AS pct_id,
MAX(pr.processing_date) AS date
FROM
vw__presentation_summary_by_ccg pr
WHERE
(pr.presentation_code LIKE '0408010AEBE%' OR
pr.presentation_code LIKE '0408010AEBF%' OR
pr.presentation_code LIKE '0408010AEBF%'
)
GROUP BY
pr.pct_id
LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..6729.87 rows=1 width=8)
-> GroupAggregate (cost=0.57..3256983.04 rows=484 width=8)
Group Key: pct_id
-> Index Scan using idx_vw__presentation_summary_by_ccg_on_code_and_pct on vw__presentation_summary_by_ccg pr (cost=0.57..3256876.11 rows=20419 width=8)
Filter: (((presentation_code)::text ~~ '0408010AEBE%'::text) OR ((presentation_code)::text ~~ '0408010AEBF%'::text) OR ((presentation_code)::text ~~ '0408010AEBF%'::text))
That was the plan; the actual is:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..6729.87 rows=1 width=8) (actual time=73279.001..73279.002 rows=1 loops=1)
-> GroupAggregate (cost=0.57..3256983.04 rows=484 width=8) (actual time=73279.000..73279.000 rows=1 loops=1)
Group Key: pct_id
-> Index Scan using idx_vw__presentation_summary_by_ccg_on_code_and_pct on vw__presentation_summary_by_ccg pr (cost=0.57..3256876.11 rows=20419 width=8) (actual time=68634.368..73278.979 rows=17 loops=1)
Filter: (((presentation_code)::text ~~ '0408010AEBE%'::text) OR ((presentation_code)::text ~~ '0408010AEBF%'::text) OR ((presentation_code)::text ~~ '0408010AEBF%'::text))
Rows Removed by Filter: 1671264
Planning time: 1.436 ms
Execution time: 73279.107 ms
It is also interesting to compare the above two variations but with fewer conditions.
Grouping by date, with a single condition:
EXPLAIN ANALYZE SELECT
pr.pct_id AS pct_id,
pr.processing_date AS date
FROM
vw__presentation_summary_by_ccg pr
WHERE
(pr.presentation_code LIKE '0408010AEBE%'
)
GROUP BY
pr.pct_id,
date
LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1056045.23..1056045.24 rows=1 width=8) (actual time=6212.168..6212.169 rows=1 loops=1)
-> Group (cost=1056045.23..1056096.28 rows=6080 width=8) (actual time=6212.166..6212.166 rows=1 loops=1)
Group Key: pct_id, processing_date
-> Sort (cost=1056045.23..1056062.25 rows=6807 width=8) (actual time=6212.163..6212.163 rows=1 loops=1)
Sort Key: pct_id, processing_date
Sort Method: quicksort Memory: 388kB
-> Index Scan using idx_vw__presentation_summary_by_ccg_on_code_and_pct on vw__presentation_summary_by_ccg pr (cost=0.57..1055611.87 rows=6807 width=8) (actual time=210.807..6207.190 rows=4161 loops=1)
Index Cond: (((presentation_code)::text ~>=~ '0408010AEBE'::text) AND ((presentation_code)::text ~<~ '0408010AEBF'::text))
Filter: ((presentation_code)::text ~~ '0408010AEBE%'::text)
Planning time: 0.421 ms
Execution time: 6212.292 ms
(11 rows)
And without the GROUP BY
:
EXPLAIN ANALYZE SELECT
pr.pct_id AS pct_id,
MAX(pr.processing_date) AS date
FROM
vw__presentation_summary_by_ccg pr
WHERE
(pr.presentation_code LIKE '0408010AEBE%'
)
GROUP BY
pr.pct_id
LIMIT 1;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..2181.66 rows=1 width=8) (actual time=145.891..145.892 rows=1 loops=1)
-> GroupAggregate (cost=0.57..1055650.74 rows=484 width=8) (actual time=145.889..145.889 rows=1 loops=1)
Group Key: pct_id
-> Index Scan using idx_vw__presentation_summary_by_ccg_on_code_and_pct on vw__presentation_summary_by_ccg pr (cost=0.57..1055611.87 rows=6807 width=8) (actual time=123.015..145.872 rows=17 loops=1)
Index Cond: (((presentation_code)::text ~>=~ '0408010AEBE'::text) AND ((presentation_code)::text ~<~ '0408010AEBF'::text))
Filter: ((presentation_code)::text ~~ '0408010AEBE%'::text)
Planning time: 0.345 ms
Execution time: 145.967 ms
I presume the reason it uses a sequence scan is because the date is not in the index.
Next steps is to benchmark our read iops. We should also consider changing the index on that table to cover date, too.
The index being scanned is 342MB; the table size is 10GB.
According to iozone tests, the average read throughput is 1429mB/sec and the average write throughput is 509mB/sec, so that seems fine (note that the "local" disk rather than the "mounted" storage gets approx 200% read and 150% write).
Copying a 20GB file from /mnt/database to /tmp/ took 2m 13s and made iowait go up to 25%; the total copy rate was around 162mB/sec.
This means scanning a 10GB table is usually going to take more than a minute, and disk performance is not really an issue
The fix must be to avoid the sequence scan, somehow .
We generally allow search on section, paragraph, chemical, or product, or "product format" (presentation). In this case, the table in question is selected for queries against product or presentation.
We may find an improvement if we don't use LIKE queries, but instead index on product and presentations separately:
CREATE INDEX idx_products ON vw__presentation_summary_by_ccg(substring(presentation_code from 1 for 11));
Then:
SELECT * from vw__presentation_summary_by_ccg WHERE substring(presentation_code from 1 for 11) = '0408010AEBF'
We could also try a GIN index:
create index test_trgm_gin on test using gin (some_text gin_trgm_ops);
However, creating a GIN index can be very slow.
This is the clearest, most pertinent documentation I've seen regarding the problem.
First we have to note that an OR
query has to involve multiple bitmap index scans. From that it follows that the repeated OR
query portion has to be very fast for the overall operation to be performant.
A simple query against the bnf code was very slow, because there was only one compound index defined with the presentation code as the second column. This required an index scan, which is relatively slow.
Presumably the loss of this index has contributed to this problem.
I have added...
CLUSTER vw__presentation_summary_by_ccg USING idx_vw__presentation_summary_by_ccg_on_code_and_pct
...which takes about 19 min to build, and greatly improves performance
This comment illustrates an example of an exceptionally slow-running query.
Another extremely slow operation is
At times this is CPU-bound; when not, it's doing reading/writing at 100Mb/S.