Open kevdoan opened 1 year ago
Your workaround matches what we mention in our performance guide https://linuxforhealth.github.io/FHIR/guides/FHIRPerformanceGuide#65-search-examples
We think the postgres optimizer is chosing a poor plan in this case. Disabling merge join on your postgresql instance might help, but could have impacts on other queries.
Describe the bug We have a workflow set up where we first search for a resource based on an identifier: If it exists, we update the resource. If it doesn't, we create it.
However, doing a GET to resources with an identifier like this:
/fhir-server/api/v4/Coverage?identifier=29754927
, we see slow performance and eventually get 500 server errors.failed to execute HTTP Get. Status Code: 500. Reason: Internal Server Error. Response: {"resourceType":"OperationOutcome","id":"a-a-2c-bc-d7b1421c-8328-41f5-a052-5056f5fcdb54","issue":[{"severity":"fatal","code":"exception","details":{"text":"FHIRPersistenceDBConnectException: Failure acquiring connection for datasource"}}]}
Upon investigation, and looking at pg_stat_activity, we identified this query as the bottleneck:
Environment 4.10.2
To Reproduce Steps to reproduce the behavior:
Expected behavior We're currently using a workaround, which is to supply a system to the identifier search to be more specific, and is much faster (2-3ms instead of 2second response for a single GET) :
/fhir-server/api/v4/Coverage?identifier=https://www.abchealthplan.com/fhir/coverageid|29754927
"https://www.abchealthplan.com/fhir/coverageid" is a static value and works for us for now, but in the future it will be dynamic and we won't always know what it is.
So, we expect the workaround and the original query to have similar performance. We noticed that the SQL query is doing a subselect which grabs all ~5million of our records from the table, since nothing is deleted: