Open punktilious opened 2 years ago
The same query structure is used to obtain the resource count so we do need to perform this filtering as part of the query itself. We therefore need to find a way to have the database perform the filtering as the last step (probably as a hash join).
The QueryUtils would log the query string, but only if com.ibm.fhir.database.utils.query.*=FINE
. It would probably make sense to have the logger passed as an argument.
fhirdb=> explain analyze
SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID
FROM FHIRDATA.LOGICAL_RESOURCES AS LR0
WHERE LR0.IS_DELETED = 'N'
AND LR0.RESOURCE_TYPE_ID IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
AND EXISTS (
SELECT 1
FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID
AND LR0.RESOURCE_TYPE_ID IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
AND (LR1.LAST_UPDATED > '2022-08-07'))
ORDER BY LR0.LOGICAL_RESOURCE_ID
LIMIT 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.87..3238.30 rows=2 width=12) (actual time=2789.601..2789.605 rows=2 loops=1)
-> Nested Loop (cost=0.87..12446298.40 rows=7689 width=12) (actual time=2789.598..2789.602 rows=2 loops=1)
-> Index Scan using logical_resources_pk on logical_resources lr1 (cost=0.43..12302511.22 rows=17277 width=8) (actual time=2789.582..2789.583 rows=2 loops=1)
Filter: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 15312459
-> Index Scan using logical_resources_pk on logical_resources lr0 (cost=0.43..8.32 rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=2)
Index Cond: (logical_resource_id = lr1.logical_resource_id)
Filter: ((is_deleted = 'N'::bpchar) AND (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[])) AND (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[])))
Planning Time: 2.269 ms
Execution Time: 2789.648 ms
fhirdb=> \d fhirdata.logical_resources
Table "fhirdata.logical_resources"
Column | Type | Collation | Nullable | Default
---------------------+-----------------------------+-----------+----------+-------------------
logical_resource_id | bigint | | not null |
resource_type_id | integer | | not null |
logical_id | character varying(255) | | not null |
reindex_tstamp | timestamp without time zone | | not null | CURRENT_TIMESTAMP
reindex_txid | bigint | | not null | 0
last_updated | timestamp without time zone | | |
is_deleted | character(1) | | not null | 'X'::bpchar
parameter_hash | character varying(44) | | |
Indexes:
"logical_resources_pk" PRIMARY KEY, btree (logical_resource_id)
"unq_logical_resources" UNIQUE, btree (resource_type_id, logical_id)
"idx_logical_resources_lupd" btree (last_updated)
"idx_logical_resources_rits" btree (reindex_tstamp DESC)
There is an index on last_updated, but it being ignored. Interestingly, even without the type filter we see the index being ignored:
explain analyze
SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID
FROM FHIRDATA.LOGICAL_RESOURCES AS LR0
WHERE LR0.IS_DELETED = 'N'
AND EXISTS (
SELECT 1
FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID
AND (LR1.LAST_UPDATED > '2022-08-07'))
ORDER BY LR0.LOGICAL_RESOURCE_ID
LIMIT 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.87..1440.96 rows=2 width=12) (actual time=2730.151..2730.154 rows=2 loops=1)
-> Nested Loop (cost=0.87..12440208.25 rows=17277 width=12) (actual time=2730.148..2730.151 rows=2 loops=1)
-> Index Scan using logical_resources_pk on logical_resources lr1 (cost=0.43..12302511.22 rows=17277 width=8) (actual time=2730.133..2730.134 rows=2 loops=1)
Filter: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 15312459
-> Index Scan using logical_resources_pk on logical_resources lr0 (cost=0.43..7.97 rows=1 width=12) (actual time=0.006..0.006 rows=1 loops=2)
Index Cond: (logical_resource_id = lr1.logical_resource_id)
Filter: (is_deleted = 'N'::bpchar)
Planning Time: 0.686 ms
Execution Time: 2730.195 ms
And even the simplest form:
fhirdb=> explain analyze
fhirdb-> SELECT LR1.LOGICAL_RESOURCE_ID
fhirdb-> FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
fhirdb-> WHERE LR1.LAST_UPDATED > '2022-08-07'
fhirdb-> ORDER BY LR1.LOGICAL_RESOURCE_ID
fhirdb-> LIMIT 2;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..177.20 rows=2 width=8) (actual time=2733.647..2733.649 rows=2 loops=1)
-> Index Scan using logical_resources_pk on logical_resources lr1 (cost=0.43..1526972.13 rows=17277 width=8) (actual time=2733.645..2733.646 rows=2 loops=1)
Filter: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
Rows Removed by Filter: 15312459
Planning Time: 0.245 ms
Execution Time: 2733.675 ms
Adding an ORDER BY last_updated to the query persuades PostgreSQL to use the index:
fhirdb=> explain analyze
fhirdb-> SELECT LR1.LOGICAL_RESOURCE_ID
fhirdb-> FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
fhirdb-> WHERE LR1.LAST_UPDATED > '2022-08-07'
fhirdb-> ORDER BY LR1.LAST_UPDATED
fhirdb-> LIMIT 2;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..4.74 rows=2 width=16) (actual time=0.007..0.008 rows=2 loops=1)
-> Index Scan using idx_logical_resources_lupd on logical_resources lr1 (cost=0.43..37206.58 rows=17277 width=16) (actual time=0.007..0.007 rows=2 loops=1)
Index Cond: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
Planning Time: 0.102 ms
Execution Time: 0.023 ms
which in this case has an execution time over 100000x faster.
Rewriting using a CTE for the main filter query gets us closer:
fhirdb=> explain analyze
WITH LR1 AS (SELECT logical_resource_id
from fhirdata.logical_resources
where resource_type_id IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
and last_updated > '2022-08-07' ORDER BY last_updated)
SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID
FROM FHIRDATA.LOGICAL_RESOURCES AS LR0
WHERE LR0.IS_DELETED = 'N'
AND LR0.RESOURCE_TYPE_ID IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
AND EXISTS ( SELECT 1 FROM LR1 WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)
ORDER BY LR0.LAST_UPDATED
LIMIT 2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=136165.06..136165.06 rows=2 width=20) (actual time=15.523..15.523 rows=2 loops=1)
-> Sort (cost=136165.06..136184.28 rows=7689 width=20) (actual time=0.148..0.148 rows=2 loops=1)
Sort Key: lr0.last_updated
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=40396.16..136088.17 rows=7689 width=20) (actual time=0.072..0.137 rows=2 loops=1)
-> HashAggregate (cost=40395.73..40510.99 rows=11526 width=8) (actual time=0.057..0.119 rows=2 loops=1)
Group Key: logical_resources.logical_resource_id
-> Index Scan using idx_logical_resources_lupd on logical_resources (cost=0.43..40251.65 rows=11526 width=16) (actual time=0.028..0.031 rows=2 loops=1)
Index Cond: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
Filter: (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[]))
-> Index Scan using logical_resources_pk on logical_resources lr0 (cost=0.43..8.30 rows=1 width=20) (actual time=0.007..0.007 rows=1 loops=2)
Index Cond: (logical_resource_id = logical_resources.logical_resource_id)
Filter: ((is_deleted = 'N'::bpchar) AND (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[])))
Planning Time: 0.537 ms
JIT:
Functions: 19
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 3.503 ms, Inlining 0.000 ms, Optimization 0.897 ms, Emission 14.208 ms, Total 18.608 ms
Execution Time: 19.397 ms
Note the EXISTS structure is just a way to simplify query building when there are multiple parameter filters we need to include. It seems a little redundant in this simple case, but is required semantically for more complex search queries.
The ordering is required for deterministic pagination behavior. The final order by should really by last_updated, logical_resource_id in case there are multiple resources sharing the same last_updated value (not uncommon)
Describe the bug The following query has only a single resource in scope and therefore should be very fast:
However, due to 4.0/4.3 type filtering, the request is transformed to:
Tracing shows the fetch of logical resource ids taking nearly 3 seconds:
We also do not log any of the queries being executed, making it difficult to determine exactly what we're issuing to the database.
Environment Which version of IBM FHIR Server? 5.0.0-SNAPSHOT
To Reproduce Steps to reproduce the behavior:
Expected behavior A search like this should return in a few milliseconds assuming the data is cached at the database level.
Additional context The injection of the _type filter is probably disrupting the execution plan of the query. Need to actually log the SQL before we can progress any investigation.