LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
330 stars 157 forks source link

Improve reindex performance by skipping resources with no search parameter changes #2155

Closed punktilious closed 3 years ago

punktilious commented 3 years ago

The $reindex operation reads every resource, extracts the search parameters, deletes the current search parameters from the database and inserts the new set.

This is expensive and unnecessary if the new search parameter set is identical to the search parameters currently stored. However, reading the existing search parameters for comparison would also be expensive and non-trivial. A possible solution is to store a fingerprint (hash) of the parameter set and use this as a comparison. If the stored hash matches the hash computed from the newly extracted parameter set then the delete/insert step can be skipped.

This optimization is also applicable to new versions of a resource.

Care is needed when computing the hash to make sure it is deterministic - meaning that the parameters must be sorted first before the hash is computed.

A hash such as SHA-256 is statistically reliable for such a feature.

A new column is required on the xx_logical_resources table. No data migration is needed because a default value of null is suitable. The next time a reindex is performed, the hash will be computed and stored. Future reindex operations will then benefit. New resources will be stored with the hash and therefore immediately benefit.

lmsurpre commented 3 years ago

Could be combined with a similar schema-change needed for #2417

lmsurpre commented 3 years ago

Need to make sure we consider the impact of global search parameters as well. Just using the code + value may not be enough in the hash (if we move the target of where they are stored).

Also need to consider search parameter disambiguation...two parameters can have the same code. In cases like this we recommend use of search parameter filtering in fhir-server-config. Thus, we probably want to use the search parameter url (not just the code) in the hash.

lmsurpre commented 3 years ago

should be done in conjunction with https://github.com/IBM/FHIR/issues/1751 to ensure good e2e test coverage.

tbieste commented 3 years ago

After looking over the code involved, here's an overall design:

DB changes:

Class changes:

Logic changes after calling extractSearchParameters(...):

lmsurpre commented 3 years ago

When I first ran this I tried with 20 reindex-max-requests (lookup param name) and I hit this:

INSERT INTO common_token_values (token_value, code_system_id)       SELECT v.token_value, v.code_system_id         FROM (VALUES (CAST(? AS VARCHAR(1024)),CAST(? AS INT)), (CAST(? AS VARCHAR(1024)),CAST(? AS INT)) ) AS v(token_value, code_system_id)     ORDER BY v.token_value, v.code_system_id  ON CONFLICT DO NOTHING 
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 125628 waits for ShareLock on transaction 469547549; blocked by process 125650.
Process 125650 waits for ShareLock on transaction 469547553; blocked by process 125653.
Process 125653 waits for ShareLock on transaction 469547548; blocked by process 125649.
Process 125649 waits for ShareLock on transaction 469547542; blocked by process 125641.
Process 125641 waits for ShareLock on transaction 469547565; blocked by process 125628.
  Hint: See server log for query details.
  Where: while inserting index tuple (521996,89) in relation "common_token_values"
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2553)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2285)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:323)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:473)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:393)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:164)
    at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:130)
    at jdk.internal.reflect.GeneratedMethodAccessor56.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at org.postgresql.ds.PGPooledConnection$StatementHandler.invoke(PGPooledConnection.java:441)
    at com.sun.proxy.$Proxy117.executeUpdate(Unknown Source)
    at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:520)
    at com.ibm.fhir.persistence.jdbc.postgres.PostgresResourceReferenceDAO.doCommonTokenValuesUpsert(PostgresResourceReferenceDAO.java:140)
    at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.upsertCommonTokenValues(ResourceReferenceDAO.java:710)
    at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.persist(ResourceReferenceDAO.java:786)
    at com.ibm.fhir.persistence.jdbc.dao.impl.ResourceReferenceDAO.addNormalizedValues(ResourceReferenceDAO.java:182)
    at com.ibm.fhir.persistence.jdbc.dao.impl.ParameterVisitorBatchDAO.close(ParameterVisitorBatchDAO.java:626)
    at com.ibm.fhir.persistence.jdbc.dao.ReindexResourceDAO.updateParameters(ReindexResourceDAO.java:354)
    at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.updateParameters(FHIRPersistenceJDBCImpl.java:2637)
    at com.ibm.fhir.persistence.jdbc.impl.FHIRPersistenceJDBCImpl.reindex(FHIRPersistenceJDBCImpl.java:2568)
    at com.ibm.fhir.server.util.FHIRRestHelper.doReindex(FHIRRestHelper.java:3083)
    at com.ibm.fhir.operation.reindex.ReindexOperation.doInvoke(ReindexOperation.java:169)
    at com.ibm.fhir.server.operation.spi.AbstractOperation.invoke(AbstractOperation.java:66)
    at com.ibm.fhir.server.util.FHIRRestHelper.doInvoke(FHIRRestHelper.java:1163)
    at com.ibm.fhir.server.resources.Operation.invoke(Operation.java:126)
    at com.ibm.fhir.server.resources.Operation$Proxy$_$$_WeldClientProxy.invoke(Unknown Source)
    at jdk.internal.reflect.GeneratedMethodAccessor75.invoke(Unknown Source)
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at com.ibm.ws.jaxrs20.cdi.component.JaxRsFactoryImplicitBeanCDICustomizer.serviceInvoke(JaxRsFactoryImplicitBeanCDICustomizer.java:342)
    at com.ibm.ws.jaxrs20.server.LibertyJaxRsServerFactoryBean.performInvocation(LibertyJaxRsServerFactoryBean.java:641)
    at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.performInvocation(LibertyJaxRsInvoker.java:160)
    at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:101)
    at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:273)
    at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:213)
    at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444)
    at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:112)
    at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
    at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
    at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
    at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123)
    at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:274)
    at com.ibm.ws.jaxrs20.endpoint.AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:137)
    at com.ibm.websphere.jaxrs.server.IBMRestServlet.handleRequest(IBMRestServlet.java:146)
    at com.ibm.websphere.jaxrs.server.IBMRestServlet.doPost(IBMRestServlet.java:104)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:706)
    at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
    at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1253)
    at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:746)
    at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
    at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183)
    at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:94)
    at com.ibm.fhir.server.filter.rest.FHIRRestServletFilter.doFilter(FHIRRestServletFilter.java:155)
    at javax.servlet.http.HttpFilter.doFilter(HttpFilter.java:127)
    at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201)
    at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
    at com.ibm.ws.security.jaspi.JaspiServletFilter.doFilter(JaspiServletFilter.java:56)
    at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:201)
    at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
    at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:1002)
    at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1140)
    at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1011)
    at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:75)
    at com.ibm.ws.webcontainer40.servlet.CacheServletWrapper40.handleRequest(CacheServletWrapper40.java:85)
    at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:938)
    at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279)
    at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1159)
    at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:428)
    at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:387)
    at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:566)
    at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:500)
    at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:360)
    at com.ibm.ws.http.channel.internal.inbound.HttpICLReadCallback.complete(HttpICLReadCallback.java:70)
    at com.ibm.ws.channel.ssl.internal.SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1824)
    at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:504)
    at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:574)
    at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:958)
    at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1047)
    at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:238)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:836)

I then turned it down to 10 and it worked just fine.

d0roppe commented 3 years ago

verified that if the search parameters did not change that the update of the search parameters is skipped for the update of the tables. Closing issue.