LinuxForHealth / FHIR

The LinuxForHealth FHIR® Server and related projects
https://linuxforhealth.github.io/FHIR
Apache License 2.0
333 stars 156 forks source link

fhir schema lease not always given up after completed schema update #3135

Open lmsurpre opened 2 years ago

lmsurpre commented 2 years ago

Describe the bug Seen once during CI (https://github.com/IBM/FHIR/runs/4581232255?check_suite_focus=true):

2021-12-20 10:57:33.398 00000001    INFO   com.ibm.fhir.schema.app.Main Refreshing tenant partitions
2021-12-20 10:57:33.427 00000001    INFO m.fhir.schema.app.LeaseManager Canceling lease for schema 'FHIRDATA'
2021-12-20 10:57:33.429 00000001    INFO m.fhir.schema.app.LeaseManager Lease canceled for schema 'FHIRDATA'
2021-12-20 10:57:33.431 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIR_OAUTH' [attempt 1]
2021-12-20 10:57:33.558 00000001    INFO   com.ibm.fhir.schema.app.Main Collecting model update tasks
2021-12-20 10:57:33.559 00000001    INFO   com.ibm.fhir.schema.app.Main Starting model updates
2021-12-20 10:57:33.636 0000000f    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_OAUTH.OAUTH20CACHE:1
2021-12-20 10:57:33.685 0000000f    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_OAUTH.OAUTH20CLIENTCONFIG:1
2021-12-20 10:57:33.718 0000000f    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_OAUTH.OAUTH20CONSENTCACHE:1
2021-12-20 10:57:33.723 00000001    INFO m.fhir.schema.app.LeaseManager Canceling lease for schema 'FHIR_OAUTH'
2021-12-20 10:57:33.724 00000001    INFO m.fhir.schema.app.LeaseManager Lease canceled for schema 'FHIR_OAUTH'
2021-12-20 10:57:33.725 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIR_JBATCH' [attempt 1]
2021-12-20 10:57:33.780 00000001    INFO   com.ibm.fhir.schema.app.Main Collecting model update tasks
2021-12-20 10:57:33.780 00000001    INFO   com.ibm.fhir.schema.app.Main Starting model updates
2021-12-20 10:57:33.841 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.JOBINSTANCE:1
2021-12-20 10:57:33.905 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.JOBEXECUTION:1
2021-12-20 10:57:33.950 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.GROUPASSOCIATION:1
2021-12-20 10:57:33.993 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.JOBPARAMETER:1
2021-12-20 10:57:34.075 0000000c    INFO m.fhir.schema.app.LeaseManager Refreshing lease for schema 'FHIRDATA'
2021-12-20 10:57:34.076 0000000c    INFO ls.pool.PoolConnectionProvider Max connections allocated, waiting for connection to be freed
2021-12-20 10:57:34.085 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.STEPTHREADEXECUTION:1
2021-12-20 10:57:34.086 00000011    INFO ls.pool.PoolConnectionProvider Max connections allocated, waiting for connection to be freed
2021-12-20 10:57:34.173 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.STEPTHREADINSTANCE:1
2021-12-20 10:57:34.230 00000011    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_JBATCH.REMOTABLEPARTITION:1
2021-12-20 10:57:34.238 00000001    INFO m.fhir.schema.app.LeaseManager Canceling lease for schema 'FHIR_JBATCH'
2021-12-20 10:57:34.240 00000001    INFO m.fhir.schema.app.LeaseManager Lease canceled for schema 'FHIR_JBATCH'
2021-12-20 10:57:34.243 00000001    INFO   com.ibm.fhir.schema.app.Main Processing took: 151.354 s
2021-12-20 10:57:34.244 00000001    INFO   com.ibm.fhir.schema.app.Main SCHEMA CHANGE: OK
2021-12-20 10:57:34.590 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 1]
2021-12-20 10:57:34.661 00000001    INFO .common.JdbcConnectionProvider Opening connection to database: jdbc:db2://localhost:50000/fhirdb
2021-12-20 10:57:36.155 00000001 WARNING ls.pool.PoolConnectionProvider Get connection took 1.491 seconds
2021-12-20 10:57:37.206 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 2]
2021-12-20 10:57:38.213 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 3]
2021-12-20 10:57:39.219 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 4]
2021-12-20 10:57:40.226 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 5]
2021-12-20 10:57:41.234 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 6]
2021-12-20 10:57:42.241 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 7]
2021-12-20 10:57:43.249 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 8]
2021-12-20 10:57:44.256 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 9]
2021-12-20 10:57:45.264 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 10]
2021-12-20 10:57:45.271 00000001    INFO m.fhir.schema.app.LeaseManager Canceling lease for schema 'FHIRDATA'
2021-12-20 10:57:45.276 00000001 WARNING m.fhir.schema.app.LeaseManager Cancel lease ignored: Lease for schema 'FHIRDATA' is not owned by this instance
2021-12-20 10:57:45.278 00000001 WARNING   com.ibm.fhir.schema.app.Main Please try again later: update is already running - Concurrent update (allocate-tenant) for FHIR data schema: 'FHIRDATA'
2021-12-20 10:57:45.279 00000001 WARNING   com.ibm.fhir.schema.app.Main SCHEMA CHANGE: CONCURRENT UPDATE

Environment main

To Reproduce ?

Expected behavior The FHIRDATA lease should either be cancelled or re-used.

Additional context

punktilious commented 2 years ago

The background schema maintenance thread isn't stopped when the lease is cancelled which can result in a race-condition. Cancelling the lease should stop the thread.

punktilious commented 2 years ago

LeaseManager#cancelLease needs to wait for the lease maintenance thread to stop before executing the CancelLease DAO command.

prb112 commented 2 years ago

Ran a bunch of schema creations across multiple terminal sessions. The locking / lease management looks good. (Note, I only created the Patient table group)

while true
do
/Library/Java/JavaVirtualMachines/temurin-11.jdk/Contents/Home/bin/java -Dfile.encoding=UTF-8 -classpath fhir-persistence-schema/target/test-classes:fhir-persistence-schema/target/classes:/Users//.m2/repository/jakarta/transaction/jakarta.transaction-api/1.3.3/jakarta.transaction-api-1.3.3.jar:fhir-database-utils/target/classes:fhir-model/target/classes:/Users//.m2/repository/org/antlr/antlr4-runtime/4.9.3/antlr4-runtime-4.9.3.jar:/Users//.m2/repository/com/github/stephenc/jcip/jcip-annotations/1.0-1/jcip-annotations-1.0-1.jar:/Users//.m2/repository/org/owasp/encoder/encoder/1.2.3/encoder-1.2.3.jar:fhir-core/target/classes:/Users//.m2/repository/org/eclipse/parsson/jakarta.json/1.0.0/jakarta.json-1.0.0.jar:/Users//.m2/repository/jakarta/annotation/jakarta.annotation-api/1.3.5/jakarta.annotation-api-1.3.5.jar:/Users//.m2/repository/org/apache/derby/derby/10.14.2.0/derby-10.14.2.0.jar:/Users//.m2/repository/org/apache/derby/derbytools/10.14.2.0/derbytools-10.14.2.0.jar:/Users//.m2/repository/com/ibm/db2/jcc/11.5.6.0/jcc-11.5.6.0.jar:/Users//.m2/repository/org/postgresql/postgresql/42.2.23/postgresql-42.2.23.jar:/Users//.m2/repository/org/checkerframework/checker-qual/3.10.0/checker-qual-3.10.0.jar:fhir-registry/target/classes:/Users//.m2/repository/org/testng/testng/7.4.0/testng-7.4.0.jar:/Users//.m2/repository/com/beust/jcommander/1.78/jcommander-1.78.jar:/Users//.m2/repository/org/webjars/jquery/3.5.1/jquery-3.5.1.jar com.ibm.fhir.schema.app.Main     --prop db.host=dbhost --prop db.port=5432  --prop db.database=fhirdb     --prop user=fhiradmin     --prop password=passw0rd     --db-type postgresql     --update-schema     --prop resourceTypes=Patient     --schema-name abcdefg 
done
lmsurpre commented 2 years ago

I hit this one again during a recent CI run. Seems pretty rare so I still think its low priority, but I want to make sure its tracked somewhere and so I'm reopening this:

2022-02-11 21:05:00.645 00000001    INFO   com.ibm.fhir.schema.app.Main Opening connection to: jdbc:db2://localhost:50000/fhirdb
2022-02-11 21:05:01.516 00000001    INFO .database.utils.db2.Db2Adapter The schema 'FHIR_ADMIN' is created
2022-02-11 21:05:01.518 00000001    INFO .database.utils.db2.Db2Adapter The schema 'FHIRDATA' is created
2022-02-11 21:05:01.526 00000001    INFO .database.utils.db2.Db2Adapter The schema 'FHIR_OAUTH' is created
2022-02-11 21:05:01.529 00000001    INFO .database.utils.db2.Db2Adapter The schema 'FHIR_JBATCH' is created
2022-02-11 21:05:01.538 00000001    INFO   com.ibm.fhir.schema.app.Main Processing took:   0.921 s
2022-02-11 21:05:01.539 00000001    INFO   com.ibm.fhir.schema.app.Main SCHEMA CHANGE: OK
2022-02-11 21:05:01.842 00000001    INFO .common.JdbcConnectionProvider Opening connection to database: jdbc:db2://localhost:50000/fhirdb
2022-02-11 21:05:02.912 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 1]
2022-02-11 21:05:05.229 00000001    INFO   com.ibm.fhir.schema.app.Main Collecting model update tasks
2022-02-11 21:05:05.461 00000001    INFO   com.ibm.fhir.schema.app.Main Starting model updates
2022-02-11 21:05:05.470 0000000d    INFO ase.utils.model.DatabaseObject Applying change [v1]: TABLESPACE:FHIR_TS:1
2022-02-11 21:05:05.532 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLESPACE:FHIR_TS:1
2022-02-11 21:05:05.551 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: SEQUENCE:FHIR_ADMIN.TENANT_SEQUENCE:1
2022-02-11 21:05:05.566 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: VARIABLE:FHIR_ADMIN.SV_TENANT_ID:1
2022-02-11 21:05:05.573 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: SEQUENCE:FHIRDATA.FHIR_SEQUENCE:1
2022-02-11 21:05:05.579 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: SEQUENCE:FHIRDATA.FHIR_REF_SEQUENCE:1
2022-02-11 21:05:05.586 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: SEQUENCE:FHIRDATA.REFERENCES_SEQUENCE:1
2022-02-11 21:05:05.659 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_ADMIN.TENANTS:1
2022-02-11 21:05:05.786 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.RESOURCE_CHANGE_LOG:19
2022-02-11 21:05:05.860 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.PARAMETER_NAMES:1
2022-02-11 21:05:05.923 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.COMMON_CANONICAL_VALUES:14
2022-02-11 21:05:05.994 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.RESOURCE_TYPES:1
2022-02-11 21:05:06.048 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.CODE_SYSTEMS:19
2022-02-11 21:05:06.054 0000000d    INFO s.common.CommonDatabaseAdapter ALTER SEQUENCE FHIRDATA.FHIR_REF_SEQUENCE RESTART WITH 20000 INCREMENT BY 1 CACHE 1000
2022-02-11 21:05:06.059 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: SEQUENCE:FHIRDATA.FHIR_REF_SEQUENCE:3
2022-02-11 21:05:06.126 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIR_ADMIN.TENANT_KEYS:1
2022-02-11 21:05:06.215 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.LOGICAL_RESOURCES:20
2022-02-11 21:05:06.272 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.COMMON_TOKEN_VALUES:6
2022-02-11 21:05:06.274 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: NOP:FHIR_ADMIN.allAdminTablesComplete:0
2022-02-11 21:05:06.378 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.STR_VALUES:20
2022-02-11 21:05:06.448 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.LOGICAL_RESOURCE_PROFILES:20
2022-02-11 21:05:06.528 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.LOGICAL_RESOURCE_COMPARTMENTS:20
2022-02-11 21:05:06.608 0000000d    INFO hir.task.core.impl.TaskManager Task complete callback for taskId: TABLE:FHIRDATA.DATE_VALUES:20
2022-02-11 21:07:34.097 00000001    INFO .common.JdbcConnectionProvider Opening connection to database: jdbc:db2://localhost:50000/fhirdb
2022-02-11 21:07:36.067 00000001 WARNING ls.pool.PoolConnectionProvider Get connection took 1.968 seconds
2022-02-11 21:07:37.128 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 2]
2022-02-11 21:07:38.135 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 3]
2022-02-11 21:07:39.141 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 4]
2022-02-11 21:07:40.148 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 5]
2022-02-11 21:07:41.155 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 6]
2022-02-11 21:07:42.161 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 7]
2022-02-11 21:07:43.167 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 8]
2022-02-11 21:07:44.173 00000001    INFO m.fhir.schema.app.LeaseManager Requesting update lease for schema 'FHIRDATA' [attempt 9]
2022-02-11 21:07:44.179 00000001    INFO m.fhir.schema.app.LeaseManager Canceling lease for schema 'FHIRDATA'
2022-02-11 21:07:44.183 00000001 WARNING m.fhir.schema.app.LeaseManager Cancel lease ignored: Lease for schema 'FHIRDATA' is not owned by this instance
2022-02-11 21:07:44.184 00000001 WARNING   com.ibm.fhir.schema.app.Main Please try again later: update is already running - Concurrent update (allocate-tenant) for FHIR data schema: 'FHIRDATA'
2022-02-11 21:07:44.184 00000001 WARNING   com.ibm.fhir.schema.app.Main SCHEMA CHANGE: CONCURRENT UPDATE