CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

Sync deadlock #387

Closed mdoering closed 4 years ago

mdoering commented 5 years ago

Syncing sectors interfers with dataset imports quite often and results in postgres deadlocks that bring down the syncs. It seems to be related to attaching a partition at the end of imports which must acquire a AccessExclusiveLock on tables, see comments history below for details.


Syncing Orthotera SF on prod gives an error related to a postgres deadlock:

org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
### The error may exist in org/col/db/mapper/DistributionMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       d.key,     d.taxon_id,     d.verbatim_key,     d.area,     d.gazetteer,     d.status,     d.reference_id,     d.created,     d.created_by,     d.modified,     d.modified_by         FROM       distribution d         WHERE d.taxon_id = ? AND d.dataset_key = ?     ORDER BY d.key
### Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
### Cause: org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
### The error may exist in org/col/db/mapper/DistributionMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       d.key,     d.taxon_id,     d.verbatim_key,     d.area,     d.gazetteer,     d.status,     d.reference_id,     d.created,     d.created_by,     d.modified,     d.modified_by         FROM       distribution d         WHERE d.taxon_id = ? AND d.dataset_key = ?     ORDER BY d.key
### Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
    at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.select(DefaultSqlSession.java:172)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.select(DefaultSqlSession.java:158)
    at org.apache.ibatis.binding.MapperMethod.executeWithResultHandler(MapperMethod.java:128)
    at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:73)
    at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)
    at com.sun.proxy.$Proxy123.processTree(Unknown Source)
    at org.col.assembly.SectorSync.processTree(SectorSync.java:188)
    at org.col.assembly.SectorSync.sync(SectorSync.java:90)
    at org.col.assembly.SectorSync.doWork(SectorSync.java:43)
    at org.col.assembly.SectorRunnable.run(SectorRunnable.java:96)
    at org.col.assembly.SectorSync.run(SectorSync.java:29)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.ibatis.exceptions.PersistenceException: 
### Error querying database.  Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
### The error may exist in org/col/db/mapper/DistributionMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       d.key,     d.taxon_id,     d.verbatim_key,     d.area,     d.gazetteer,     d.status,     d.reference_id,     d.created,     d.created_by,     d.modified,     d.modified_by         FROM       distribution d         WHERE d.taxon_id = ? AND d.dataset_key = ?     ORDER BY d.key
### Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
    at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:150)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:141)
    at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:139)
    at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:76)
    at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)
    at com.sun.proxy.$Proxy118.listByTaxon(Unknown Source)
    at org.col.dao.TaxonDao.copyTaxon(TaxonDao.java:77)
    at org.col.assembly.TreeCopyHandler.handleResult(TreeCopyHandler.java:219)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.callResultHandler(DefaultResultSetHandler.java:371)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.storeObject(DefaultResultSetHandler.java:364)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleRowValuesForNestedResultMap(DefaultResultSetHandler.java:877)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleRowValues(DefaultResultSetHandler.java:328)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleResultSet(DefaultResultSetHandler.java:306)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleResultSets(DefaultResultSetHandler.java:196)
    at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:64)
    at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
    at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:63)
    at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:326)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:136)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.select(DefaultSqlSession.java:170)
    ... 15 common frames omitted
Caused by: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 9902 waits for AccessShareLock on relation 23056880 of database 23056339; blocked by process 10500.
Process 10500 waits for AccessExclusiveLock on relation 23056872 of database 23056339; blocked by process 9902.
  Hint: See server log for query details.
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:308)
    at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441)
    at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365)
    at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:143)
    at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:132)
    at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44)
    at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java)
    at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:63)
    at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
    at org.apache.ibatis.executor.BatchExecutor.doQuery(BatchExecutor.java:93)
    at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:326)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:136)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:148)
    ... 35 common frames omitted
mdoering commented 5 years ago

When syncing sector of Fishbase we also often get a idle_in_transaction_session_timeout exception which might be related:

org.postgresql.util.PSQLException: FATAL: terminating connection due to idle-in-transaction timeout
    at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2440)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2183)
    at org.postgresql.core.v3.QueryExecutorImpl.fetch(QueryExecutorImpl.java:2372)
    at org.postgresql.jdbc.PgResultSet.next(PgResultSet.java:1856)
    at com.zaxxer.hikari.pool.HikariProxyResultSet.next(HikariProxyResultSet.java)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleRowValuesForNestedResultMap(DefaultResultSetHandler.java:869)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleRowValues(DefaultResultSetHandler.java:328)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleResultSet(DefaultResultSetHandler.java:306)
    at org.apache.ibatis.executor.resultset.DefaultResultSetHandler.handleResultSets(DefaultResultSetHandler.java:196)
    at org.apache.ibatis.executor.statement.PreparedStatementHandler.query(PreparedStatementHandler.java:64)
    at org.apache.ibatis.executor.statement.RoutingStatementHandler.query(RoutingStatementHandler.java:79)
    at org.apache.ibatis.executor.SimpleExecutor.doQuery(SimpleExecutor.java:63)
    at org.apache.ibatis.executor.BaseExecutor.queryFromDatabase(BaseExecutor.java:326)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:156)
    at org.apache.ibatis.executor.BaseExecutor.query(BaseExecutor.java:136)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.select(DefaultSqlSession.java:170)
    at org.apache.ibatis.session.defaults.DefaultSqlSession.select(DefaultSqlSession.java:158)
    at org.apache.ibatis.binding.MapperMethod.executeWithResultHandler(MapperMethod.java:128)
    at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:73)
    at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:59)
    at com.sun.proxy.$Proxy141.processTree(Unknown Source)
    at org.col.assembly.SectorSync.processTree(SectorSync.java:188)
    at org.col.assembly.SectorSync.sync(SectorSync.java:90)
mdoering commented 5 years ago

From postgres logs today:

12:39:26 UTC coldb-vh /var/lib/pgsql/11/data/log # grep -A 75 -n "deadlock detected" postgresql-Thu.log 
2019-06-20 14:27:09.878 CEST [31229] ERROR:  deadlock detected
2019-06-20 14:27:09.878 CEST [31229] DETAIL:  Process 31229 waits for AccessShareLock on relation 30939489 of database 30938994; blocked by process 30510.
    Process 30510 waits for AccessExclusiveLock on relation 30939520 of database 30938994; blocked by process 31229.
    Process 31229: INSERT INTO reference_3 (  
            id,
            dataset_key,
            sector_key,
            verbatim_key,
            csl,
            citation,
            year,
            created_by,
            modified_by
         , created, modified)
            VALUES (  
            $1,
            $2,
            $3,
            $4,
            $5::jsonb,
            $6,
            $7,
            $8,
            $9
         , now(), now())
    RETURNING *
    Process 30510: 
            ALTER TABLE usage_reference ATTACH PARTITION usage_reference_1140 FOR VALUES IN ( 1140 )
2019-06-20 14:27:09.878 CEST [31229] HINT:  See server log for query details.
2019-06-20 14:27:09.878 CEST [31229] STATEMENT:  INSERT INTO reference_3 (  
            id,
            dataset_key,
            sector_key,
            verbatim_key,
            csl,
            citation,
            year,
            created_by,
            modified_by
         , created, modified)
            VALUES (  
            $1,
            $2,
            $3,
            $4,
            $5::jsonb,
            $6,
            $7,
            $8,
            $9
         , now(), now())
    RETURNING *
2019-06-20 14:32:10.252 CEST [31229] ERROR:  deadlock detected
2019-06-20 14:32:10.252 CEST [31229] DETAIL:  Process 31229 waits for AccessShareLock on relation 30939489 of database 30938994; blocked by process 31267.
    Process 31267 waits for AccessExclusiveLock on relation 30939520 of database 30938994; blocked by process 31229.
    Process 31229: INSERT INTO reference_3 (  
            id,
            dataset_key,
            sector_key,
            verbatim_key,
            csl,
            citation,
            year,
            created_by,
            modified_by
         , created, modified)
            VALUES (  
            $1,
            $2,
            $3,
            $4,
            $5::jsonb,
            $6,
            $7,
            $8,
            $9
         , now(), now())
    RETURNING *
    Process 31267: 
            ALTER TABLE usage_reference ATTACH PARTITION usage_reference_1132 FOR VALUES IN ( 1132 )
2019-06-20 14:32:10.252 CEST [31229] HINT:  See server log for query details.
2019-06-20 14:32:10.252 CEST [31229] STATEMENT:  INSERT INTO reference_3 (  
            id,
            dataset_key,
            sector_key,
            verbatim_key,
            csl,
            citation,
            year,
            created_by,
            modified_by
         , created, modified)
            VALUES (  
            $1,
            $2,
            $3,
            $4,
            $5::jsonb,
            $6,
            $7,
            $8,
            $9
         , now(), now())
    RETURNING *
mdoering commented 5 years ago

still more of that. This time we SELECT from the partitioned table when we could have access the dataset specific table directly:

### Error querying database.  Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 23906 waits for AccessShareLock on relation 30939525 of database 30938994; blocked by process 24135.
Process 24135 waits for AccessExclusiveLock on relation 30939520 of database 30938994; blocked by process 23906.
  Hint: See server log for query details.
### The error may exist in org/col/db/mapper/VernacularNameMapper.xml
### The error may involve defaultParameterMap
### The error occurred while setting parameters
### SQL: SELECT       vn.key,     vn.verbatim_key,     vn.name,     vn.latin,     vn.language,     vn.country,     vn.area,     vn.reference_id,     vn.created,     vn.created_by,     vn.modified,     vn.modified_by         FROM       vernacular_name vn         WHERE vn.dataset_key = ?       AND vn.taxon_id = ?     ORDER BY name
### Cause: org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 23906 waits for AccessShareLock on relation 30939525 of database 30938994; blocked by process 24135.
Process 24135 waits for AccessExclusiveLock on relation 30939520 of database 30938994; blocked by process 23906.
  Hint: See server log for query details.
mdoering commented 4 years ago

I have not seen this happening for month now. Closing this, please reopen if encountered again!