OpenLiberty / open-liberty

Open Liberty is a highly composable, fast to start, dynamic application server runtime environment
https://openliberty.io
Eclipse Public License 2.0
1.15k stars 591 forks source link

CacheMap throws ArrayOutOfBoundsException for datasource statement cache. #18824

Open dannemano opened 3 years ago

dannemano commented 3 years ago

Describe the bug
In some cases we get an java.lang.ArrayIndexOutOfBoundsException for the datasource statement cache. The datasource is configured as:

<dataSource id="db2"
                jndiName="jdbc/db2"
                isolationLevel="TRANSACTION_READ_COMMITTED"
                statementCacheSize="130"
                queryTimeout="70s"
    >

We cannot reproduce the issue but it sometimes occurs. There is not a clear reason for us when it occurs either.

Caused by: java.lang.ArrayIndexOutOfBoundsException: null at com.ibm.ws.rsadapter.impl.CacheMap.removeLRU(CacheMap.java:350) at com.ibm.ws.rsadapter.impl.CacheMap.add(CacheMap.java:162) at com.ibm.ws.rsadapter.impl.WSRdbManagedConnectionImpl.cacheStatement(WSRdbManagedConnectionImpl.java:2228) at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.closeWrapper(WSJdbcPreparedStatement.java:332) at com.ibm.ws.rsadapter.jdbc.WSJdbcObject.close(WSJdbcObject.java:152) at com.ibm.ws.rsadapter.jdbc.WSJdbcObject.close(WSJdbcObject.java:110) at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.closeStatement(DatabaseAccessor.java:417) at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.releaseStatement(DatabaseAccessor.java:1710) at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.basicExecuteCall(DatabaseAccessor.java:712) at org.eclipse.persistence.internal.databaseaccess.DatabaseAccessor.executeCall(DatabaseAccessor.java:567) at org.eclipse.persistence.internal.sessions.AbstractSession.basicExecuteCall(AbstractSession.java:2105) at org.eclipse.persistence.sessions.server.ServerSession.executeCall(ServerSession.java:603) at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:277) at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeCall(DatasourceCallQueryMechanism.java:263) at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.executeSelectCall(DatasourceCallQueryMechanism.java:334) at org.eclipse.persistence.internal.queries.DatasourceCallQueryMechanism.selectAllRows(DatasourceCallQueryMechanism.java:746) at org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.selectAllRowsFromTable(ExpressionQueryMechanism.java:2776) at org.eclipse.persistence.internal.queries.ExpressionQueryMechanism.selectAllRows(ExpressionQueryMechanism.java:2729) at org.eclipse.persistence.queries.ReadAllQuery.executeObjectLevelReadQuery(ReadAllQuery.java:584) at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeDatabaseQuery(ObjectLevelReadQuery.java:1232) at org.eclipse.persistence.queries.DatabaseQuery.execute(DatabaseQuery.java:911) at org.eclipse.persistence.queries.ObjectLevelReadQuery.execute(ObjectLevelReadQuery.java:1191) at org.eclipse.persistence.queries.ReadAllQuery.execute(ReadAllQuery.java:485) at org.eclipse.persistence.queries.ObjectLevelReadQuery.executeInUnitOfWork(ObjectLevelReadQuery.java:1279) at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.internalExecuteQuery(UnitOfWorkImpl.java:3004) at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1898) at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1880) at org.eclipse.persistence.internal.sessions.AbstractSession.executeQuery(AbstractSession.java:1845) at org.eclipse.persistence.internal.jpa.QueryImpl.executeReadQuery(QueryImpl.java:262) at org.eclipse.persistence.internal.jpa.QueryImpl.getResultList(QueryImpl.java:482)

Steps to Reproduce
Cannot reproduce.

Expected behavior
The datasource statement cache should not throw array out of bounds when it removes last recently used statement.

Diagnostic information:

openjdk version "11.0.12" 2021-07-20 IBM Semeru Runtime Open Edition 11.0.12.0 (build 11.0.12+7) Eclipse OpenJ9 VM 11.0.12.0 (build openj9-0.27.0, JRE 11 Linux amd64-64-Bit Compressed References 20210729_228 (JIT enabled, AOT enabled) OpenJ9 - 1851b0074 OMR - 9db1c870d JCL - 21849e2ca0 based on jdk-11.0.12+7)

Additional context
I have tried to reproduce the issue by generating a lot of different prepared statements to overload the CacheMap but not succeeded.

njr-11 commented 3 years ago

Whenever I have seen this error in the past, it's from applications that are trying to use a JDBC Connection from 2 threads at once, which is neither supported by the JDBC programming model nor by Liberty, where the JDBC wrapper classes, including those that perform statement caching per connection are accordingly not written to be thread-safe for concurrent access from multiple threads.

A concern here is that it is EclipseLink JPA code, not application code, that is accessing the JDBC artifacts here, and it seems unlikely that JPA code would use JDBC interfaces contrary to the programming model.

dannemano commented 3 years ago

Hi, many thanks for the quick reply. I saw in the source of CacheMap (removeAll()) the similar hints:

Tr.debug(this, tc, "ArrayIndexOutOfBoundsException is caught during removeAll() of the cachMap ", this);
                Tr.debug(this, tc, "Possible causes:");
                Tr.debug(this, tc, "multithreaded access of JDBC objects by the Application");
                Tr.debug(this, tc, "Application is closing JDBC objects in a finalize()");
                Tr.debug(this, tc, "Exception is: ", ioobX);

I guess the same it applies to removeLRU() that fails for us in this case.

I am a little confused about the multithreaded connection risk as you mention. We are using JPA (injecting entityManager) and sometimes dataSource.getConnection(). Do you mean that there is a risk of this happening if the connection object is leaked to more than one thread? The application is pretty tight on how queries are executed but a guess I have to search through some legacy code to see if this can happen.

njr-11 commented 3 years ago

I am a little confused about the multithreaded connection risk as you mention. We are using JPA (injecting entityManager) and sometimes dataSource.getConnection(). Do you mean that there is a risk of this happening if the connection object is leaked to more than one thread? The application is pretty tight on how queries are executed but a guess I have to search through some legacy code to see if this can happen.

The usual way to get this error to occur is that an application would be doing dataSource.getConnection() and then supplying a reference to that same connection, or a statement created from it, to another thread, after which both threads end up using the JDBC resources at the same time. With JPA, you might be able to indirectly get into the same situation if you share the reference to the JPA entity with another thread, so I'd check for that as well.

But it's also possible that it could be something outside of your control causing multithreaded access to the JDBC resources such as a transaction timeout causing an active connection to be aborted from another thread. Are there any warning/error indicating a transaction timeout in your server logs around the point when this happened?

dannemano commented 3 years ago

I would say we have transaction timeouts. Not obvious at the time of this error, but yes there are occurrences of timeouts.

dannemano commented 3 years ago

It is likely we started noticing this error when we removed agedTimeout on the connectionManager. Not sure if it is related but longer living connections increase the risk of the failure situation.