atomikos / transactions-essentials

Development repository for next major release of
https://www.atomikos.com/Main/TransactionsEssentials
Other
461 stars 139 forks source link

Atomikos transaction management issue - Errors during recovery #174

Open mandarlaad opened 1 year ago

mandarlaad commented 1 year ago

Bug description Below is one sample from the several instances of sequences of error/warnings (shown below in an ascending timestamp order) getting printed in the Azure app service (hosting my spring boot app) logs, reported by Azure app insights.

1> Atomikos:1 WARN c.a.r.xa.XaResourceRecoveryManager : Error while retrieving xids from resource - will retry later... org.postgresql.xa.PGXAException: Error during recover at org.postgresql.xa.PGXAConnection.recover(PGXAConnection.java:382) at com.atomikos.datasource.xa.RecoveryScan.recoverXids(RecoveryScan.java:32) at com.atomikos.recovery.xa.XaResourceRecoveryManager.retrievePreparedXidsFromXaResource(XaResourceRecoveryManager.java:158) at com.atomikos.recovery.xa.XaResourceRecoveryManager.recover(XaResourceRecoveryManager.java:67) at com.atomikos.datasource.xa.XATransactionalResource.recover(XATransactionalResource.java:449) at com.atomikos.icatch.imp.TransactionServiceImp.performRecovery(TransactionServiceImp.java:490) at com.atomikos.icatch.imp.TransactionServiceImp.access$000(TransactionServiceImp.java:56) at com.atomikos.icatch.imp.TransactionServiceImp$1.alarm(TransactionServiceImp.java:471) at com.atomikos.timing.PooledAlarmTimer.notifyListeners(PooledAlarmTimer.java:95) at com.atomikos.timing.PooledAlarmTimer.run(PooledAlarmTimer.java:82) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.postgresql.util.PSQLException: This connection has been closed. at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:803) at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1615) at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:410) at org.postgresql.xa.PGXAConnection.recover(PGXAConnection.java:359) ... 12 common frames omitted

2> Atomikos:1 INFO c.a.d.xa.XATransactionalResource : admin-service-db-conn-pool: refreshed XAResource

3> Atomikos:3426 WARN c.a.datasource.xa.XAResourceTransaction : XA resource 'admin-service-db-conn-pool': rollback for XID '3136392E3235342E3132392E342E746D313636373734303230333830323033343234:3136392E3235342E3132392E342E746D33343234' raised -3: the XA resource detected an internal error org.postgresql.xa.PGXAException: Error rolling back prepared transaction at org.postgresql.xa.PGXAConnection.rollback(PGXAConnection.java:424) at com.atomikos.datasource.xa.XAResourceTransaction.rollback(XAResourceTransaction.java:460) at com.atomikos.icatch.imp.RollbackMessage.send(RollbackMessage.java:47) at com.atomikos.icatch.imp.RollbackMessage.send(RollbackMessage.java:20) at com.atomikos.icatch.imp.PropagationMessage.submit(PropagationMessage.java:67) at com.atomikos.icatch.imp.Propagator$PropagatorThread.run(Propagator.java:63) at com.atomikos.icatch.imp.Propagator.submitPropagationMessage(Propagator.java:42) at com.atomikos.icatch.imp.HeurHazardStateHandler.onTimeout(HeurHazardStateHandler.java:75) at com.atomikos.icatch.imp.CoordinatorImp.alarm(CoordinatorImp.java:650) at com.atomikos.timing.PooledAlarmTimer.notifyListeners(PooledAlarmTimer.java:95) at com.atomikos.timing.PooledAlarmTimer.run(PooledAlarmTimer.java:82) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.postgresql.util.PSQLException: ERROR: cannot execute ROLLBACK PREPARED during recovery at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2455) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2155) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288) at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430) at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:303) at org.postgresql.jdbc.PgStatement.executeCachedSql(PgStatement.java:289) at org.postgresql.jdbc.PgStatement.executeWithFlags(PgStatement.java:266) at org.postgresql.jdbc.PgStatement.executeUpdate(PgStatement.java:246) at org.postgresql.xa.PGXAConnection.rollback(PGXAConnection.java:414) ... 13 common frames omitted

4> Atomikos:3426 ERROR c.a.datasource.xa.XAResourceTransaction : Heuristic termination of XAResourceTransaction: 3136392E3235342E3132392E342E746D313636373734303230333830323033343234:3136392E3235342E3132392E342E746D33343234 with state HEUR_HAZARD

5> Atomikos:3426 ERROR com.atomikos.icatch.imp.CoordinatorImp : Local heuristic termination of coordinator 169.254.129.4.tm166774020380203424 with state HEUR_HAZARD

6> Atomikos:3426 ERROR com.atomikos.publish.EventPublisher : Transaction 169.254.129.4.tm166774020380203424 corrupted - check https://www.atomikos.com/Documentation/HowToHandleHeuristics to learn more

Then the rollback and heuristic termination steps (step3 & 4) are repeated once or twice.

Steps to reproduce the behavior: There is no specific action which triggers this behavior. The app in concern makes connections to a Postgres DB (Azure service for Postgres relational DB) and the app uses atomikos (via spring boot JTA library) for transaction management. These errors/warnings are being logged continuously in the log of my spring boot app which runs on an azure app service.

Additional context Following are the versions of libraries (relevant to this discussion) being used -

Spring Boot - 2.1.2.RELEASE com.atomikos:atomikos-util:4.0.6 com.atomikos:transactions:4.0.6 com.atomikos:transactions-api:4.0.6 com.atomikos:transactions-hibernate4:4.0.6 com.atomikos:transactions-jdbc:4.0.6 com.atomikos:transactions-jms:4.0.6 com.atomikos:transactions-jta:4.0.6

Also sharing below the connection properties, just in case if those help understand the issue better -

Connection Pool minPoolSize=1 maxPoolSize=3 borrowConnectionTimeout=PT10S idleConnectionTimeout=PT30S maxConnectionLifetime=PT0S poolMaintenanceInterval=PT10S testQuery=SELECT 1 collectHqlMetrics=true

JPA enableJta=true showSql=false dialect=org.hibernate.dialect.PostgreSQL9Dialect