ebean-orm / ebean

Ebean ORM
https://ebean.io
Apache License 2.0
1.46k stars 260 forks source link

Race condition EBean 14 #3412

Closed tbee closed 3 months ago

tbee commented 4 months ago

We've just reverted back to EBean 12 from Ebean 14.3.0. We observe the following behavior:

A request comes in and EBean claims a connection from the pool. A few milliseconds later a second connection is requested. Normally both are released afterwards. Peachy.

However, if the load is high, the whole pool is drained by requests claiming their first connection, the remaining requests are waiting on the pool's semaphore. Then the requests that got their first connection attempt to get a second, and then also wait on the pool's semaphore. All threads are now blocked, with the only escape being the timeout.

I've got a cleaned log for one thread below:

08:38:40,704 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] getConnection
08:38:40,718 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] Got connectionn 29fa10a2
08:38:40,725 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] returnConnection(29fa10a2, false)
08:38:40,800 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] getConnection
08:38:40,800 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] Got connection 31be82a7
08:38:40,830 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] getConnection
08:39:10,839 TRACE (default task-14) [org.jboss.jca.core.connectionmanager.pool.strategy.OnePool] returnConnection(31be82a7, false)
08:39:10,842 ERROR (default task-14) [io.undertow.request] UT005023: Exception handling request to /api/users/by_uuid/883b5486-43d8-4a23-8411-512b22082d75: org.jboss.resteasy.spi.UnhandledException: javax.persistence.PersistenceException: java.sql.SQLException: javax.resource.ResourceException: IJ000453: Unable to get managed connection for java:/HealthcareDS

You see a first correct get/return connection. The second get is also successful, but before returning that connection a third get (30 ms later) is attempted. The pool is exhausted at that time.

What has changed in EBean 14 that it requires two connections?

I'l continue my investigation.

tbee commented 4 months ago

It appears the second connection is for a ImplicitReadOnlyTransaction which has its own connection.

First connection is used:

logSql:405, CQueryEngine (io.ebeaninternal.server.query)
findMany:348, CQueryEngine (io.ebeaninternal.server.query)
findMany:126, DefaultOrmQueryEngine (io.ebeaninternal.server.query)
findList:403, OrmQueryRequest (io.ebeaninternal.server.core)
findList:1430, DefaultServer (io.ebeaninternal.server.core)
findOne:1060, DefaultServer (io.ebeaninternal.server.core)
findOne:1569, DefaultOrmQuery (io.ebeaninternal.server.querydefn)
findOne:429, DefaultExpressionList (io.ebeaninternal.server.expression)
byUUID:54, UserFinder (com.nedap.healthcare.administration.authorization.finder)
...

Second connection is used:

sql:40, DTxnLogger (io.ebeaninternal.server.logger)
logSql:443, ImplicitReadOnlyTransaction (io.ebeaninternal.server.transaction)
logSql:405, CQueryEngine (io.ebeaninternal.server.query)
findMany:348, CQueryEngine (io.ebeaninternal.server.query)
findMany:126, DefaultOrmQueryEngine (io.ebeaninternal.server.query)
findList:403, OrmQueryRequest (io.ebeaninternal.server.core)
findList:1430, DefaultServer (io.ebeaninternal.server.core)
findList:1409, DefaultServer (io.ebeaninternal.server.core)
executeQuery:149, DefaultBeanLoader (io.ebeaninternal.server.core)
loadMany:37, DefaultBeanLoader (io.ebeaninternal.server.core)
loadMany:462, DefaultServer (io.ebeaninternal.server.core)
loadSecondaryQuery:98, DLoadManyContext (io.ebeaninternal.server.loadcontext)
executeSecondaryQueries:192, DLoadContext (io.ebeaninternal.server.loadcontext)
executeSecondaryQueries:121, OrmQueryRequest (io.ebeaninternal.server.core)
findMany:357, CQueryEngine (io.ebeaninternal.server.query)
findMany:126, DefaultOrmQueryEngine (io.ebeaninternal.server.query)
findList:403, OrmQueryRequest (io.ebeaninternal.server.core)
findList:1430, DefaultServer (io.ebeaninternal.server.core)
findOne:1060, DefaultServer (io.ebeaninternal.server.core)
findOne:1569, DefaultOrmQuery (io.ebeaninternal.server.querydefn)
findOne:429, DefaultExpressionList (io.ebeaninternal.server.expression)
byUUID:54, UserFinder (com.nedap.healthcare.administration.authorization.finder)
...
tbee commented 4 months ago

It seems this setup maybe intended for use with a ReadonlyDatasource. But setting that up shows that the two nested getConnections both claim from the readonly datasource, thus does not solve the problem. (The first working get/return claims from the normal pool.)

tbee commented 4 months ago

Below are the two stacktraces of the two claims for a connection without returning in between.

getConnection:276, SemaphoreConcurrentLinkedDequeManagedConnectionPool (org.jboss.jca.core.connectionmanager.pool.mcp)
getSimpleConnection:640, AbstractPool (org.jboss.jca.core.connectionmanager.pool)
getConnection:605, AbstractPool (org.jboss.jca.core.connectionmanager.pool)
getManagedConnection:624, AbstractConnectionManager (org.jboss.jca.core.connectionmanager)
getManagedConnection:440, TxConnectionManagerImpl (org.jboss.jca.core.connectionmanager.tx)
allocateConnection:789, AbstractConnectionManager (org.jboss.jca.core.connectionmanager)
getConnection:151, WrapperDataSource (org.jboss.jca.adapters.jdbc)
getConnection:64, WildFlyDataSource (org.jboss.as.connector.subsystems.datasources)
createReadOnlyTransaction:32, TransactionFactoryBasicWithRead (io.ebeaninternal.server.transaction)
createReadOnlyTransaction:294, TransactionManager (io.ebeaninternal.server.transaction)
createReadOnlyTransaction:2153, DefaultServer (io.ebeaninternal.server.core)
initTransIfRequired:219, OrmQueryRequest (io.ebeaninternal.server.core)
findList:1429, DefaultServer (io.ebeaninternal.server.core)
findOne:1060, DefaultServer (io.ebeaninternal.server.core)
findOne:1569, DefaultOrmQuery (io.ebeaninternal.server.querydefn)
findOne:429, DefaultExpressionList (io.ebeaninternal.server.expression)
byUUID:54, UserFinder (com.nedap.healthcare.administration.authorization.finder)
byUuid:607, UserResource (com.nedap.healthcare.administration.rest)
userByUuid:155, UserResource (com.nedap.healthcare.administration.rest)
...

getConnection:276, SemaphoreConcurrentLinkedDequeManagedConnectionPool (org.jboss.jca.core.connectionmanager.pool.mcp)
getSimpleConnection:640, AbstractPool (org.jboss.jca.core.connectionmanager.pool)
getConnection:605, AbstractPool (org.jboss.jca.core.connectionmanager.pool)
getManagedConnection:624, AbstractConnectionManager (org.jboss.jca.core.connectionmanager)
getManagedConnection:440, TxConnectionManagerImpl (org.jboss.jca.core.connectionmanager.tx)
allocateConnection:789, AbstractConnectionManager (org.jboss.jca.core.connectionmanager)
getConnection:151, WrapperDataSource (org.jboss.jca.adapters.jdbc)
getConnection:64, WildFlyDataSource (org.jboss.as.connector.subsystems.datasources)
createReadOnlyTransaction:32, TransactionFactoryBasicWithRead (io.ebeaninternal.server.transaction)
createReadOnlyTransaction:294, TransactionManager (io.ebeaninternal.server.transaction)
createReadOnlyTransaction:2153, DefaultServer (io.ebeaninternal.server.core)
initTransIfRequired:219, OrmQueryRequest (io.ebeaninternal.server.core)
findList:1429, DefaultServer (io.ebeaninternal.server.core)
findList:1409, DefaultServer (io.ebeaninternal.server.core)
executeQuery:149, DefaultBeanLoader (io.ebeaninternal.server.core)
loadMany:37, DefaultBeanLoader (io.ebeaninternal.server.core)
loadMany:462, DefaultServer (io.ebeaninternal.server.core)
loadSecondaryQuery:98, DLoadManyContext (io.ebeaninternal.server.loadcontext)
executeSecondaryQueries:192, DLoadContext (io.ebeaninternal.server.loadcontext)
executeSecondaryQueries:121, OrmQueryRequest (io.ebeaninternal.server.core)
findMany:357, CQueryEngine (io.ebeaninternal.server.query)
findMany:126, DefaultOrmQueryEngine (io.ebeaninternal.server.query)
findList:403, OrmQueryRequest (io.ebeaninternal.server.core)
findList:1430, DefaultServer (io.ebeaninternal.server.core)
findOne:1060, DefaultServer (io.ebeaninternal.server.core)
findOne:1569, DefaultOrmQuery (io.ebeaninternal.server.querydefn)
findOne:429, DefaultExpressionList (io.ebeaninternal.server.expression)
byUUID:54, UserFinder (com.nedap.healthcare.administration.authorization.finder)
byUuid:607, UserResource (com.nedap.healthcare.administration.rest)
userByUuid:155, UserResource (com.nedap.healthcare.administration.rest)
...

The second stacktrace has two different requests objects in DefaultServer line 1429. They are constructed just above in line 1414. It would make sense that the second request reuses the readonly connection obtained by the first request.

Under EBean 12 this nesting indeed does not happen. And if an explicit JTA transaction is started, both request use that connection.

tbee commented 3 months ago

I was able to patch the behaviour by introducing a ThreadLocal variable in OrmQueryRequest.initTransIfRequired and solely for createReadOnlyTransaction store the first transaction within a thread, and return that on subsequent calls. Does not seem like the correct way to fix it, but it appears to work.

tbee commented 3 months ago

Alternatively one can add to e.g. LoadManyRequest.createQuery code to use the existing transaction:

    if (this.transaction() != null) {
      query.usingTransaction(transaction);
    }

Or in DefaultBeanLoader

  void loadMany(LoadManyRequest loadRequest) {
    SpiQuery<?> query = loadRequest.createQuery(server);
    query.usingTransaction(loadRequest.transaction());
    executeQuery(loadRequest, query);
    loadRequest.postLoad();
  }

This feels like a better approach. Where the last one seems like it could be the fix, because it mirrors loadBean in the same class.

tbee commented 3 months ago

Any idea when a bugfix release will come out?

rbygrave commented 3 months ago

Should be on the way to central, so should be available in about 20 minutes.