FirebirdSQL / jaybird

JDBC driver for Firebird
https://www.firebirdsql.org/en/jdbc-driver/
GNU Lesser General Public License v2.1
91 stars 23 forks source link

PooledConnectionQueue not theadsafe? [JDBC341] #386

Closed firebird-automations closed 10 years ago

firebird-automations commented 10 years ago

Submitted by: Steffen Heil (firebirdsql.org_partner.sh-solutions.de)

(First off, I know the environment is quite outdated, but I am quite sure, this has nothing to do with my problem.)

I have a massively multithreaded application that uses jaybird (currently 2.2.0) with a connection pool and a maxPoolSize of 15. For years it worked perfectly (I am not sure, at what point we updated to 2.2.0 though).

Yesterday for the very first time, we noticed a situation where a thread tried to obtain a connection but failed to do so. Interestingly all connections opened by our application were closed. I am sure about that, because very connection we acquire gets wrapped and the wrappers are tracked. (A background task closes connections, when the wrappers get GCed and logs a stacktrace of the thread that acquired the connection.)

Out own code always acquires database connections using one method in our framework that is synchronized, so getConnection is never called by more than one thread. So the only thread in FB classes at the time of the problem was:

"trustedChannel-worker-1" prio=10 tid=0x00007f0a2d877800 nid=0x19ca in Object.wait() [0x0000000041aec000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.firebirdsql.pool.BlockingStack.pop(BlockingStack.java:147) - locked <0x0000000571cdc700> (a java.lang.Object) at org.firebirdsql.pool.PooledConnectionQueue.take(PooledConnectionQueue.java:413) - locked <0x0000000571cdb980> (a java.lang.Object) at org.firebirdsql.pool.AbstractConnectionPool.getPooledConnection(AbstractConnectionPool.java:231) - locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource) at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:352) - locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource) at org.firebirdsql.pool.AbstractFBConnectionPoolDataSource.getPooledConnection(AbstractFBConnectionPoolDataSource.java:372) - locked <0x0000000571d1c450> (a org.firebirdsql.pool.FBConnectionPoolDataSource) at org.firebirdsql.pool.FBWrappingDataSource.getConnection(FBWrappingDataSource.java:219) at com.osiris4.core.persistence.servers.FirebirdDatabase.getSafeConnection(FirebirdDatabase.java:377) - locked <0x0000000571ef4ec0> (a com.osiris4.core.persistence.servers.FirebirdDatabase)

Using eclipse MAT I inspected a memory dump (that I still have, if questions arise) and found the (only) instance of PooledConnectionQueue that used the lock 0x0000000571cdb980. Inspecting that PooledConnectionQueue instance showed the following:

Type |Name |Value ------------------------------------------------------------------------------------------------------------ ref |connectionIdleTime |java.util.HashMap @ 0x571ed19f8 ref |workingConnectionsToClose|java.util.HashSet @ 0x571cdb9b0 ref |workingConnections |java.util.HashSet @ 0x571cdb998 int |totalConnections |15 ref |idleRemover |org.firebirdsql.pool.PooledConnectionQueue$IdleRemover @ 0x571e33ca8 ref |takeMutex |java.lang.Object @ 0x571cdb980 boolean|blocked |true ref |stack |org.firebirdsql.pool.BlockingStack @ 0x571cdb968 int |size |-37 int |blockingTimeout |600000 ref |queueName |//127.0.0.1//opt/database/osiris4.fdb ref |key |org.firebirdsql.pool.AbstractConnectionPool$UserPasswordPair @ 0x571e33ea0 ref |configuration |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450 ref |logger |null ref |connectionManager |org.firebirdsql.pool.FBConnectionPoolDataSource @ 0x571d1c450 ------------------------------------------------------------------------------------------------------------

I suspect that "size" should NOT have a value of *-37*. Looking into the code seems to suggest that access to "size" is only synchronized in one of four locations.

Additionally there *are* 15 instances of "FBPooledConnection", all of which have "inPool" set to *true*....

I am willing to inspect this further but so far I have no idea about all the internal classes used in jaybird, so I would strongly appreciate if someone else could have a look as well.

Commits: FirebirdSQL/jaybird@c3fb11578803184dacfdda6f7020e1a9d2b37394 FirebirdSQL/fbt-repository@8fd7349d095ce93e166955d5732ebec48de2f90c

firebird-automations commented 10 years ago
Modified by: Steffen Heil (firebirdsql.org_partner.sh-solutions.de) description: \(First off, I know the environment is quite outdated, but I am quite sure, this has nothing to do with my problem\.\) I have a massively multithreaded application that uses jaybird \(currently 2\.2\.0\) with a connection pool and a maxPoolSize of 15\. For years I worked perfectly \(I am not sure, at what point we updated to 2\.2\.0 though\)\. Yesterday for the very first time, we noticed a situation where a thread tried to obtain a connection but failed to do so\. Interestingly all connections opened by our application were closed\. I am sure about that, because very connection we acquire gets wrapped and the wrappers are tracked\. \(A background task closes connections, when the wrappers get GCed and logs a stacktrace of the thread that acquired the connection\.\) Out own code always acquires database connections using one method in our framework that is synchronized, so getConnection is never called by more than one thread\. So the only thread in FB classes at the time of the problem was: "trustedChannel\-worker\-1" prio=10 tid=0x00007f0a2d877800 nid=0x19ca in Object\.wait\(\) \[0x0000000041aec000\] java\.lang\.Thread\.State: TIMED\_WAITING \(on object monitor\) at java\.lang\.Object\.wait\(Native Method\) at org\.firebirdsql\.pool\.BlockingStack\.pop\(BlockingStack\.java:147\) \- locked <0x0000000571cdc700\> \(a java\.lang\.Object\) at org\.firebirdsql\.pool\.PooledConnectionQueue\.take\(PooledConnectionQueue\.java:413\) \- locked <0x0000000571cdb980\> \(a java\.lang\.Object\) at org\.firebirdsql\.pool\.AbstractConnectionPool\.getPooledConnection\(AbstractConnectionPool\.java:231\) \- locked <0x0000000571d1c450\> \(a org\.firebirdsql\.pool\.FBConnectionPoolDataSource\) at org\.firebirdsql\.pool\.AbstractFBConnectionPoolDataSource\.getPooledConnection\(AbstractFBConnectionPoolDataSource\.java:352\) \- locked <0x0000000571d1c450\> \(a org\.firebirdsql\.pool\.FBConnectionPoolDataSource\) at org\.firebirdsql\.pool\.AbstractFBConnectionPoolDataSource\.getPooledConnection\(AbstractFBConnectionPoolDataSource\.java:372\) \- locked <0x0000000571d1c450\> \(a org\.firebirdsql\.pool\.FBConnectionPoolDataSource\) at org\.firebirdsql\.pool\.FBWrappingDataSource\.getConnection\(FBWrappingDataSource\.java:219\) at com\.osiris4\.core\.persistence\.servers\.FirebirdDatabase\.getSafeConnection\(FirebirdDatabase\.java:377\) \- locked <0x0000000571ef4ec0\> \(a com\.osiris4\.core\.persistence\.servers\.FirebirdDatabase\) Using eclipse MAT I inspected a memory dump \(that I still have, if questions arise\) and found the \(only\) instance of PooledConnectionQueue that used the lock 0x0000000571cdb980\. Inspecting that PooledConnectionQueue instance showed the following: Type \|Name \|Value \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- ref \|connectionIdleTime \|java\.util\.HashMap @ 0x571ed19f8 ref \|workingConnectionsToClose\|java\.util\.HashSet @ 0x571cdb9b0 ref \|workingConnections \|java\.util\.HashSet @ 0x571cdb998 int \|totalConnections \|15 ref \|idleRemover \|org\.firebirdsql\.pool\.PooledConnectionQueue$IdleRemover @ 0x571e33ca8 ref \|takeMutex \|java\.lang\.Object @ 0x571cdb980 boolean\|blocked \|true ref \|stack \|org\.firebirdsql\.pool\.BlockingStack @ 0x571cdb968 int \|size \|\-37 int \|blockingTimeout \|600000 ref \|queueName \|//127\.0\.0\.1//opt/database/osiris4\.fdb ref \|key \|org\.firebirdsql\.pool\.AbstractConnectionPool$UserPasswordPair @ 0x571e33ea0 ref \|configuration \|org\.firebirdsql\.pool\.FBConnectionPoolDataSource @ 0x571d1c450 ref \|logger \|null ref \|connectionManager \|org\.firebirdsql\.pool\.FBConnectionPoolDataSource @ 0x571d1c450 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- I suspect that "size" should NOT have a value of \*\-37\*\. Looking into the code seems to suggest that access to "size" is only synchronized in one of four locations\. Additionally there \*are\* 15 instances of "FBPooledConnection", all of which have "inPool" set to \*true\*\.\.\.\. I am willing to inspect this further but so far I have no idea about all the internal classes used in jaybird, so I would strongly appreciate if someone else could have a look as well\. =\> \(First off, I know the environment is quite outdated, but I am quite sure, this has nothing to do with my problem\.\) I have a massively multithreaded application that uses jaybird \(currently 2\.2\.0\) with a connection pool and a maxPoolSize of 15\. For years it worked perfectly \(I am not sure, at what point we updated to 2\.2\.0 though\)\. Yesterday for the very first time, we noticed a situation where a thread tried to obtain a connection but failed to do so\. Interestingly all connections opened by our application were closed\. I am sure about that, because very connection we acquire gets wrapped and the wrappers are tracked\. \(A background task closes connections, when the wrappers get GCed and logs a stacktrace of the thread that acquired the connection\.\) Out own code always acquires database connections using one method in our framework that is synchronized, so getConnection is never called by more than one thread\. So the only thread in FB classes at the time of the problem was: "trustedChannel\-worker\-1" prio=10 tid=0x00007f0a2d877800 nid=0x19ca in Object\.wait\(\) \[0x0000000041aec000\] java\.lang\.Thread\.State: TIMED\_WAITING \(on object monitor\) at java\.lang\.Object\.wait\(Native Method\) at org\.firebirdsql\.pool\.BlockingStack\.pop\(BlockingStack\.java:147\) \- locked <0x0000000571cdc700\> \(a java\.lang\.Object\) at org\.firebirdsql\.pool\.PooledConnectionQueue\.take\(PooledConnectionQueue\.java:413\) \- locked <0x0000000571cdb980\> \(a java\.lang\.Object\) at org\.firebirdsql\.pool\.AbstractConnectionPool\.getPooledConnection\(AbstractConnectionPool\.java:231\) \- locked <0x0000000571d1c450\> \(a org\.firebirdsql\.pool\.FBConnectionPoolDataSource\) at org\.firebirdsql\.pool\.AbstractFBConnectionPoolDataSource\.getPooledConnection\(AbstractFBConnectionPoolDataSource\.java:352\) \- locked <0x0000000571d1c450\> \(a org\.firebirdsql\.pool\.FBConnectionPoolDataSource\) at org\.firebirdsql\.pool\.AbstractFBConnectionPoolDataSource\.getPooledConnection\(AbstractFBConnectionPoolDataSource\.java:372\) \- locked <0x0000000571d1c450\> \(a org\.firebirdsql\.pool\.FBConnectionPoolDataSource\) at org\.firebirdsql\.pool\.FBWrappingDataSource\.getConnection\(FBWrappingDataSource\.java:219\) at com\.osiris4\.core\.persistence\.servers\.FirebirdDatabase\.getSafeConnection\(FirebirdDatabase\.java:377\) \- locked <0x0000000571ef4ec0\> \(a com\.osiris4\.core\.persistence\.servers\.FirebirdDatabase\) Using eclipse MAT I inspected a memory dump \(that I still have, if questions arise\) and found the \(only\) instance of PooledConnectionQueue that used the lock 0x0000000571cdb980\. Inspecting that PooledConnectionQueue instance showed the following: Type \|Name \|Value \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- ref \|connectionIdleTime \|java\.util\.HashMap @ 0x571ed19f8 ref \|workingConnectionsToClose\|java\.util\.HashSet @ 0x571cdb9b0 ref \|workingConnections \|java\.util\.HashSet @ 0x571cdb998 int \|totalConnections \|15 ref \|idleRemover \|org\.firebirdsql\.pool\.PooledConnectionQueue$IdleRemover @ 0x571e33ca8 ref \|takeMutex \|java\.lang\.Object @ 0x571cdb980 boolean\|blocked \|true ref \|stack \|org\.firebirdsql\.pool\.BlockingStack @ 0x571cdb968 int \|size \|\-37 int \|blockingTimeout \|600000 ref \|queueName \|//127\.0\.0\.1//opt/database/osiris4\.fdb ref \|key \|org\.firebirdsql\.pool\.AbstractConnectionPool$UserPasswordPair @ 0x571e33ea0 ref \|configuration \|org\.firebirdsql\.pool\.FBConnectionPoolDataSource @ 0x571d1c450 ref \|logger \|null ref \|connectionManager \|org\.firebirdsql\.pool\.FBConnectionPoolDataSource @ 0x571d1c450 \-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\- I suspect that "size" should NOT have a value of \*\-37\*\. Looking into the code seems to suggest that access to "size" is only synchronized in one of four locations\. Additionally there \*are\* 15 instances of "FBPooledConnection", all of which have "inPool" set to \*true\*\.\.\.\. I am willing to inspect this further but so far I have no idea about all the internal classes used in jaybird, so I would strongly appreciate if someone else could have a look as well\.
firebird-automations commented 10 years ago

Commented by: @mrotteveel

As also discussed on Firebird-Java (https://groups.yahoo.com/neo/groups/Firebird-Java/conversations/topics/10992 (archive)), all classes in org.firebirdsql.pool should be considered deprecated and will be removed in Jaybird 3.0. I suggest to move to either use the connection pool of your application server (if applicable), or to switch to c3p0, BoneCP or another third party connection pool.

That said, I would have expected this to work correctly with FBWrappingDataSource, as such I will see if I can find the cause and I plan to fix it in Jaybird 2.2.5 or maybe 2.2.6.

firebird-automations commented 10 years ago
Modified by: @mrotteveel Fix Version: Jaybird 2\.2\.5 \[ 10582 \]
firebird-automations commented 10 years ago

Commented by: @mrotteveel

Testing also revealed a deadlock issue.

firebird-automations commented 10 years ago

Commented by: @mrotteveel

Testing with a fixed version of PooledConnectionQueue reveals a deadlock between PooledConnectionQueue and AbstractFBConnectionPoolDataSource

firebird-automations commented 10 years ago

Commented by: @mrotteveel

Reworked the locking and synchronization of PooledConnectionQueue and related classes. As far as I have been able to reason and test this should fix the deadlocks I discovered earlier. However this needs additional testing.

I will release a snapshot version of Jaybird 2.2.5 before the end of the month (so: somewhere next week) for testing.

firebird-automations commented 10 years ago

Commented by: @mrotteveel

The SNAPSHOT of 2.2.5 is available at https://sourceforge.net/projects/firebird/files/firebird-jca-jdbc-driver/2.2.5-SNAPSHOT/ and also in the Sonatype maven snapshot repository https://oss.sonatype.org/content/repositories/snapshots/

If you could test this it would be really helpful.

firebird-automations commented 10 years ago

Commented by: @mrotteveel

Testing shows the issue to have been fixed, no other thread safety issues uncovered.

firebird-automations commented 10 years ago
Modified by: @mrotteveel status: Open \[ 1 \] =\> Resolved \[ 5 \] resolution: Fixed \[ 1 \]
firebird-automations commented 10 years ago
Modified by: @mrotteveel status: Resolved \[ 5 \] =\> Closed \[ 6 \]