LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
331 stars 157 forks source link

Slow whole-system search query with _lastUpdated when _type filter is injected #3851

Open punktilious opened 2 years ago

punktilious commented 2 years ago

Describe the bug The following query has only a single resource in scope and therefore should be very fast:

GET [base]/?_count=2&_lastUpdated=gt2022-08-07

However, due to 4.0/4.3 type filtering, the request is transformed to:

GET [base]/?_count=2&_lastUpdated=gt2022-08-07&_type=Binary,Bundle,Account,ActivityDefinition,AdministrableProductDefinition,AdverseEvent,AllergyIntolerance,Appointment,AppointmentResponse,AuditEvent,Basic,BiologicallyDerivedProduct,BodyStructure,CapabilityStatement,CarePlan,CareTeam,CatalogEntry,ChargeItem,ChargeItemDefinition,Citation,Claim,ClaimResponse,ClinicalImpression,ClinicalUseDefinition,CodeSystem,Communication,CommunicationRequest,CompartmentDefinition,Composition,ConceptMap,Condition,Consent,Contract,Coverage,CoverageEligibilityRequest,CoverageEligibilityResponse,DetectedIssue,Device,DeviceDefinition,DeviceMetric,DeviceRequest,DeviceUseStatement,DiagnosticReport,DocumentManifest,DocumentReference,Encounter,Endpoint,EnrollmentRequest,EnrollmentResponse,EpisodeOfCare,EventDefinition,Evidence,EvidenceReport,EvidenceVariable,ExampleScenario,ExplanationOfBenefit,FamilyMemberHistory,Flag,Goal,GraphDefinition,Group,GuidanceResponse,HealthcareService,ImagingStudy,Immunization,ImmunizationEvaluation,ImmunizationRecommendation,ImplementationGuide,Ingredient,InsurancePlan,Invoice,Library,Linkage,List,Location,ManufacturedItemDefinition,Measure,MeasureReport,Media,Medication,MedicationAdministration,MedicationDispense,MedicationKnowledge,MedicationRequest,MedicationStatement,MedicinalProductDefinition,MessageDefinition,MessageHeader,MolecularSequence,NamingSystem,NutritionOrder,NutritionProduct,Observation,ObservationDefinition,OperationDefinition,OperationOutcome,Organization,OrganizationAffiliation,PackagedProductDefinition,Parameters,Patient,PaymentNotice,PaymentReconciliation,Person,PlanDefinition,Practitioner,PractitionerRole,Procedure,Provenance,Questionnaire,QuestionnaireResponse,RegulatedAuthorization,RelatedPerson,RequestGroup,ResearchDefinition,ResearchElementDefinition,ResearchStudy,ResearchSubject,RiskAssessment,Schedule,SearchParameter,ServiceRequest,Slot,Specimen,SpecimenDefinition,StructureDefinition,StructureMap,Subscription,SubscriptionStatus,SubscriptionTopic,Substance,SubstanceDefinition,SupplyDelivery,SupplyRequest,Task,TerminologyCapabilities,TestReport,TestScript,ValueSet,VerificationResult,VisionPrescription

Tracing shows the fetch of logical resource ids taking nearly 3 seconds:

[08/08/2022, 08:50:29:443 UTC] 0000003a ResourceDAOIm 1   Successfully retrieved logical resource Ids [took 2852.176132 ms]

We also do not log any of the queries being executed, making it difficult to determine exactly what we're issuing to the database.

Environment Which version of IBM FHIR Server? 5.0.0-SNAPSHOT

To Reproduce Steps to reproduce the behavior:

  1. Enable FINE for com.ibm.fhir.persistence.jdbc.*
  2. Create a search query using _lastUpdated which should return only one or two resources
  3. Examine the response time

Expected behavior A search like this should return in a few milliseconds assuming the data is cached at the database level.

Additional context The injection of the _type filter is probably disrupting the execution plan of the query. Need to actually log the SQL before we can progress any investigation.

punktilious commented 2 years ago

The same query structure is used to obtain the resource count so we do need to perform this filtering as part of the query itself. We therefore need to find a way to have the database perform the filtering as the last step (probably as a hash join).

punktilious commented 2 years ago

The QueryUtils would log the query string, but only if com.ibm.fhir.database.utils.query.*=FINE. It would probably make sense to have the logger passed as an argument.

punktilious commented 2 years ago
fhirdb=> explain analyze
SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID
        FROM FHIRDATA.LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND LR0.RESOURCE_TYPE_ID IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
         AND EXISTS (
      SELECT 1
        FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID
         AND LR0.RESOURCE_TYPE_ID IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
         AND (LR1.LAST_UPDATED > '2022-08-07'))
    ORDER BY LR0.LOGICAL_RESOURCE_ID
 LIMIT 2;
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       QUERY PLAN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.87..3238.30 rows=2 width=12) (actual time=2789.601..2789.605 rows=2 loops=1)
   ->  Nested Loop  (cost=0.87..12446298.40 rows=7689 width=12) (actual time=2789.598..2789.602 rows=2 loops=1)
         ->  Index Scan using logical_resources_pk on logical_resources lr1  (cost=0.43..12302511.22 rows=17277 width=8) (actual time=2789.582..2789.583 rows=2 loops=1)
               Filter: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
               Rows Removed by Filter: 15312459
         ->  Index Scan using logical_resources_pk on logical_resources lr0  (cost=0.43..8.32 rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=2)
               Index Cond: (logical_resource_id = lr1.logical_resource_id)
               Filter: ((is_deleted = 'N'::bpchar) AND (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[])) AND (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[])))
 Planning Time: 2.269 ms
 Execution Time: 2789.648 ms
punktilious commented 2 years ago
fhirdb=> \d fhirdata.logical_resources
                              Table "fhirdata.logical_resources"
       Column        |            Type             | Collation | Nullable |      Default      
---------------------+-----------------------------+-----------+----------+-------------------
 logical_resource_id | bigint                      |           | not null | 
 resource_type_id    | integer                     |           | not null | 
 logical_id          | character varying(255)      |           | not null | 
 reindex_tstamp      | timestamp without time zone |           | not null | CURRENT_TIMESTAMP
 reindex_txid        | bigint                      |           | not null | 0
 last_updated        | timestamp without time zone |           |          | 
 is_deleted          | character(1)                |           | not null | 'X'::bpchar
 parameter_hash      | character varying(44)       |           |          | 
Indexes:
    "logical_resources_pk" PRIMARY KEY, btree (logical_resource_id)
    "unq_logical_resources" UNIQUE, btree (resource_type_id, logical_id)
    "idx_logical_resources_lupd" btree (last_updated)
    "idx_logical_resources_rits" btree (reindex_tstamp DESC)

There is an index on last_updated, but it being ignored. Interestingly, even without the type filter we see the index being ignored:

explain analyze
SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID
        FROM FHIRDATA.LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND EXISTS (
      SELECT 1
        FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
       WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID
         AND (LR1.LAST_UPDATED > '2022-08-07'))
    ORDER BY LR0.LOGICAL_RESOURCE_ID
 LIMIT 2;
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.87..1440.96 rows=2 width=12) (actual time=2730.151..2730.154 rows=2 loops=1)
   ->  Nested Loop  (cost=0.87..12440208.25 rows=17277 width=12) (actual time=2730.148..2730.151 rows=2 loops=1)
         ->  Index Scan using logical_resources_pk on logical_resources lr1  (cost=0.43..12302511.22 rows=17277 width=8) (actual time=2730.133..2730.134 rows=2 loops=1)
               Filter: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
               Rows Removed by Filter: 15312459
         ->  Index Scan using logical_resources_pk on logical_resources lr0  (cost=0.43..7.97 rows=1 width=12) (actual time=0.006..0.006 rows=1 loops=2)
               Index Cond: (logical_resource_id = lr1.logical_resource_id)
               Filter: (is_deleted = 'N'::bpchar)
 Planning Time: 0.686 ms
 Execution Time: 2730.195 ms

And even the simplest form:

fhirdb=> explain analyze
fhirdb->       SELECT LR1.LOGICAL_RESOURCE_ID
fhirdb->         FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
fhirdb->         WHERE LR1.LAST_UPDATED > '2022-08-07'
fhirdb->     ORDER BY LR1.LOGICAL_RESOURCE_ID
fhirdb->  LIMIT 2;
                                                                            QUERY PLAN                                                                            
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..177.20 rows=2 width=8) (actual time=2733.647..2733.649 rows=2 loops=1)
   ->  Index Scan using logical_resources_pk on logical_resources lr1  (cost=0.43..1526972.13 rows=17277 width=8) (actual time=2733.645..2733.646 rows=2 loops=1)
         Filter: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
         Rows Removed by Filter: 15312459
 Planning Time: 0.245 ms
 Execution Time: 2733.675 ms
punktilious commented 2 years ago

Adding an ORDER BY last_updated to the query persuades PostgreSQL to use the index:

fhirdb=> explain analyze
fhirdb->       SELECT LR1.LOGICAL_RESOURCE_ID
fhirdb->         FROM FHIRDATA.LOGICAL_RESOURCES AS LR1
fhirdb->         WHERE LR1.LAST_UPDATED > '2022-08-07'
fhirdb->     ORDER BY LR1.LAST_UPDATED
fhirdb->  LIMIT 2;
                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..4.74 rows=2 width=16) (actual time=0.007..0.008 rows=2 loops=1)
   ->  Index Scan using idx_logical_resources_lupd on logical_resources lr1  (cost=0.43..37206.58 rows=17277 width=16) (actual time=0.007..0.007 rows=2 loops=1)
         Index Cond: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
 Planning Time: 0.102 ms
 Execution Time: 0.023 ms

which in this case has an execution time over 100000x faster.

punktilious commented 2 years ago

Rewriting using a CTE for the main filter query gets us closer:

fhirdb=> explain analyze
WITH LR1 AS (SELECT logical_resource_id
               from fhirdata.logical_resources
              where resource_type_id IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
                and last_updated > '2022-08-07' ORDER BY last_updated)
SELECT LR0.RESOURCE_TYPE_ID, LR0.LOGICAL_RESOURCE_ID
        FROM FHIRDATA.LOGICAL_RESOURCES AS LR0
       WHERE LR0.IS_DELETED = 'N'
         AND LR0.RESOURCE_TYPE_ID IN (9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148)
         AND EXISTS ( SELECT 1 FROM LR1 WHERE LR1.LOGICAL_RESOURCE_ID = LR0.LOGICAL_RESOURCE_ID)
    ORDER BY LR0.LAST_UPDATED
 LIMIT 2;
                                                                                                                                                                                                                                                                                         QUERY PLAN                                                                                                                                                                                                                                                                                         
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=136165.06..136165.06 rows=2 width=20) (actual time=15.523..15.523 rows=2 loops=1)
   ->  Sort  (cost=136165.06..136184.28 rows=7689 width=20) (actual time=0.148..0.148 rows=2 loops=1)
         Sort Key: lr0.last_updated
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=40396.16..136088.17 rows=7689 width=20) (actual time=0.072..0.137 rows=2 loops=1)
               ->  HashAggregate  (cost=40395.73..40510.99 rows=11526 width=8) (actual time=0.057..0.119 rows=2 loops=1)
                     Group Key: logical_resources.logical_resource_id
                     ->  Index Scan using idx_logical_resources_lupd on logical_resources  (cost=0.43..40251.65 rows=11526 width=16) (actual time=0.028..0.031 rows=2 loops=1)
                           Index Cond: (last_updated > '2022-08-07 00:00:00'::timestamp without time zone)
                           Filter: (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[]))
               ->  Index Scan using logical_resources_pk on logical_resources lr0  (cost=0.43..8.30 rows=1 width=20) (actual time=0.007..0.007 rows=1 loops=2)
                     Index Cond: (logical_resource_id = logical_resources.logical_resource_id)
                     Filter: ((is_deleted = 'N'::bpchar) AND (resource_type_id = ANY ('{9,12,1,2,149,3,4,5,6,7,8,10,11,13,14,15,16,17,18,150,19,20,21,151,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,45,46,47,48,49,50,51,152,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,153,67,68,69,70,71,72,154,73,74,75,76,77,78,79,80,81,155,92,93,94,95,96,156,97,98,99,100,101,102,157,103,104,105,106,107,108,109,110,111,112,113,114,158,115,116,117,118,119,120,122,124,125,126,127,128,129,130,131,132,159,160,133,161,140,141,142,143,144,145,146,147,148}'::integer[])))
 Planning Time: 0.537 ms
 JIT:
   Functions: 19
   Options: Inlining false, Optimization false, Expressions true, Deforming true
   Timing: Generation 3.503 ms, Inlining 0.000 ms, Optimization 0.897 ms, Emission 14.208 ms, Total 18.608 ms
 Execution Time: 19.397 ms

Note the EXISTS structure is just a way to simplify query building when there are multiple parameter filters we need to include. It seems a little redundant in this simple case, but is required semantically for more complex search queries.

The ordering is required for deterministic pagination behavior. The final order by should really by last_updated, logical_resource_id in case there are multiple resources sharing the same last_updated value (not uncommon)