Closed Andrzej-Pietrzak closed 7 years ago
What's the status on this issue ?
The date indexes are not working properly.
The queries like:
/Appointment?_type=Appointment,Encounter&_count=10&date=ge2016-10-09T22:00:00.000Z&date=le2016-10-10T21:59:59.999Z&_sort:asc=date&location=278751fd-0a35-4ddf-a64b-c68fbe1ac750
end up with very poor performance as indexes are not being picked up.
Could the date issues be addressed first?
The date index testing shows that there are some issues with some of the ‘period’ and ‘datetime’ queries that will require some more work. For 2 out of 4 test parameters there were queries that should return 0 results and finish immediately that took a lot of time for an unknown reason.
On encounter example, given encounters between 1970 to 2016, searching with open date ranges (only lt, le, gt, ge) when expecting no results, would take a few minutes to complete:
SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=lt1910-01-01&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=le1911-03-21&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=gt2020-01-01&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=ge2021-04-05&_totalMethod=no"}');
Adding additional query params would not help. A similar long running query:
SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=le1911-03-21&organization=a1bfe328-a775-4a82-8f88-b1608bc589f5&_totalMethod=no"}');
On the other hand adding the upper or lower bound partially helps :
SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=gt1910-12-01&date=lt1920-01-01"}'); -- Time: 321.489 ms, 0 res SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=gt2019-12-01&date=lt2020-01-01"}'); -- Time: 5.817 ms, 0 res
I’ve tested this on a date (patient.birthdate) an instant (appointment.date) a period (encounter.date) and a dateTime (observation.date). There were no problems with date nor instant, this can be observed only on the period and the date time.
SET plv8.start_proc = 'plv8_init'; SELECT fhir_index_parameter('{"resourceType": "Encounter", "name": "date"}'); VACUUM ANALYZE encounter; SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=lt1910-01-01&_totalMethod=no"}');
SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=lt1910-01-01&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=le1911-03-21&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=gt2020-01-01&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=ge2021-04-05&_totalMethod=no"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=gt1910-12-01&date=lt1920-01-01"}'); SELECT fhir_search('{"resourceType": "Encounter", "queryString": "date=gt2019-12-01&date=lt2020-01-01"}');
EXPLAIN ANALYZE SELECT tbl1.* FROM encounter tbl1 WHERE fhir_extract_as_epoch_lower( ( resource )::json , ( '[{"path":["Encounter","period"],"elementType":"Period"}]' )::json ) < extract(epoch from ('1910-01-01')::timestamp with time zone)::double precision LIMIT 10;
EXPLAIN ANALYZE SELECT tbl1.* FROM encounter tbl1 WHERE fhir_extract_as_epoch_upper( ( resource )::json , ( '[{"path":["Encounter","period"],"elementType":"Period"}]' )::json ) > extract(epoch from ('2020-01-01')::timestamp with time zone)::double precision LIMIT 10;
I retested these date searches again with the same data and it seems to work fine this time.
I think there's a chance I forgot to force garbage collection after dropping one data set and ingesting a new one. So the problem is reproducible when you terminate a storage, create a new one, ingest data and then apply indexes again, but it disappears after calling VACUUM ANALYZE. So I'm guessing thats just user error on my side.
Tested with dataset:
Search times: boolean: Patient?active=false -> 0 res, Time: 81654.428 ms - empty set so count is not an issue Patient?active=true -> 3M res, Time: 77672.724 ms
code:
string: Patient?given=Tim -> 2977 res, Time: 116.572 ms Patient?family=Smith -> 12685 res, Time: 614.471 ms
Identifier: Patient?identifier=155996 -> 3 res, Time: 7.730 ms Patient?identifier=155997 -> 5 res, Time: 7.004 ms Patient?identifier=155998 -> 1 res, Time: 8.804 ms Patient?identifier=155998 -> 1 res, Time: 8.804 ms Patient?identifier=155999 -> 2 res, Time: 6.755 ms Patient?identifier=Hospital Number|155999 -> 2 res, 8.103 ms Patient?identifier=nonexisting|155999 -> 0 res, 13.298 ms Patient?identifier=NHS Number|155999 -> 0 res, 5.728 ms Patient?identifier=NHS Number|4560123456 -> 0 res, 5.868 ms
ContactPoint: Patient?telecom=1-118-942-6952 -> 1 res, Time: 28.317 ms Patient?phone=1-118-942-6952 -> 1 res, 9.165 ms
CodableConcept: Observation?code=kgcb19994-3 -> 46502 res, Time: 23117.381 ms -- maybe count is an issue ? Observation?code=55284-4 -> 3000000 res, Time: 631105.296 ms Observation?code=0asdter -> 0 res, Time: 8.657 ms
Coding (as a single coding): DiagnosticReport?code=10671-6 -> 61086 res, Time: 24070.661 ms - maybe count is an issue ? DiagnosticReport?code=10744-1 -> 60565 res, Time: 22976.348 ms DiagnosticReport?code=10777-1 -> 60608 res, Time: 21693.530 ms DiagnosticReport?code=nothing -> 0 res, Time: 8.648 ms
Reference: Observation?patient=062fac5a-f185-4735-8b6a-08b8f5c7ed83 -> 5 res, Time: 26.207 ms Observation?patient=something -> 0 res, Time: 6.421 ms Patient?_id=062fac5a-f185-4735-8b6a-08b8f5c7ed83 -> 1 res, Time: 18.706 ms Patient?_id=062fac5a-f185-4735-8b6a-08b8f5c7ed83&_include=Observation:patient -> 1 res (and 5 includes), Time: 10.482 ms Obsevation?patient=062fac5a-f185-4735-8b6a-08b8f5c7ed83&_include=patient -> 5 res (and 1 include), Time: 7.778 ms DiagnosticReport?_id=54d30aeb-0ebe-482c-aa6f-8b3c55d581dc&_include=result -> 1 res (and 2 observations), Time: 41.008 ms
Quantity Given 15M observations, 300k are weight measurements in ks's in range of 20-150 Observation?value-quantity=gt150 -> 2001804 res, Time: 455556.565 ms Observation?value-quantity=gt198 -> 80168 res, Time: 30602.466 ms Observation?value-quantity=eq100 -> 156648 res, Time: 61523.574 ms Observation?value-quantity=gt150||kg -> 0 res, Time: 642673.473 ms Observation?value-quantity=gt100||kg -> 509624 res, Time: 380629.512 ms Observation?value-quantity=ge150||kg -> 10120 res, Time: 294714.285 ms Observation?value-quantity=ge150|someNonexistingSys|kg -> 0 res, Time: 377733.673 ms
DateTime: DiagnosticReport?issued=eq2007-09-18T10:59:36.352Z -> 1 res, Time: 80.448 ms DiagnosticReport?issued=gt2017-09-18T10:59:36.352Z -> 0 res, Time: 12.172 ms DiagnosticReport?issued=gt2016-04-01T00:00:00Z -> 46903 res, Time: 9175.543 ms
DateTime or Period: Observation?date=gt2017-05-05T10:00:00 -> 0 res, 615413.026 ms - empty set so count is not an issue Observation?date=eq1993-06-16T20:29:19.188Z -> 1 res, 790026.877 ms
Date: birthdate=eq2010-01-01 -> 190 res, Time: 55.705 ms birthdate=eq1924-01-01 -> 0 res, Time: 8.881 ms birthdate=eq1990-01-01 -> 125 res, Time: 12.198 ms birthdate=gt2012-01-01 -> 185549 res, Time: 21494.050 ms birthdate=gt2016-01-01 -> 13938 res, Time: 68.909 ms birthdate=ge2016-01-01 -> 14064 res, Time: 33.016 ms birthdate=lt1930-01-01 -> 0 res, 6.926 ms birthdate=lt1947-01-01 -> 26812 res, Time: 109.724 ms birthdate=le1947-01-01 -> 26930 res, Time: 113.753 ms birthdate=lt1950-01-01 -> 155457 res, Time: 18612.779 ms birthdate=lt1950-01-01&birthdate=gt1949-01-01 -> 42872 res, Time: 158.717 ms