eclipse-ee4j / glassfish

Eclipse GlassFish
https://eclipse-ee4j.github.io/glassfish/
386 stars 144 forks source link

It didn't reach the limit of max-pool-size, but there was an maxPoolSize exhausted exception #23483

Closed micRo-wlY closed 7 months ago

micRo-wlY commented 3 years ago

Hi!

We hava an application that uses jdbc connection pooling.And uses the following settings: max-pool-size:200 pool-resize-quantity:2 max-wait-time-in-millis:5000

It has been working fine for a long time, but recently we get the following exception while trying to concurrently access.

[java.sql.SQLException:Error in allocating a connection. Cause: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.]

The number of concurrent requests is 100. Actually,It didn't exceed the limit of max-pool-size 200.

By analyzing the ConnectionPool code,we find a problem in ConnectionPool#resizePoolAndGetNewResource.

In order to understand the problem simply,suppose uses the following settings: max-pool-size:10 pool-resize-quantity:1 max-wait-time-in-millis:5000

and,the problem flow is as follows:

  1. now there are five connections in the jdbc connection pool and all these connections are in-used

  2. one request comes to get connection(by Thread A)

  3. because there's still space to create a connection(current connections < max-pool-size), Thread A starts to execute the following resizePoolAndGetNewResource. In **1 method create 1 new connection, and add connection to connection pool.

  4. thread A before executing **2 method (take the connection off from connection pool) thread B just comes to get connection,and just takes the connection off from connection pool.

  5. when thread A executes 2 method,the connection was tooken by thread B, returns null. and max-wait-time-in-millis is timeout,in 3 method,the error will happen.

[java.sql.SQLException:Error in allocating a connection. Cause: In-use connections equal max-pool-size and expired max-wait-time. Cannot allocate more connections.]

Environment Details

Questions about this case

Now,we hava one question:

When a thread execute the resizePoolAndGetNewResource, should this thread get the connection first?
If YES,the code of resizePoolAndGetNewResource has a bug. If NO,the error message [Cause: In-use connections equal max-pool-size and expired max-wait-time.] is not right.

Related source

com.sun.enterprise.resource.pool.ConnectionPool

    private ResourceHandle resizePoolAndGetNewResource(ResourceAllocator alloc) throws PoolingException {
        //Must be called from the thread holding the lock to this pool.
        ResourceHandle result = null;
        int numOfConnsToCreate = 0;
        if (ds.getResourcesSize() < steadyPoolSize) {
            // May be all invalid resources are destroyed as
            // a result no free resource found and no. of resources is less than steady-pool-size
            numOfConnsToCreate = steadyPoolSize - ds.getResourcesSize();
        } else if (ds.getResourcesSize() + resizeQuantity <= maxPoolSize) {
            //Create and add resources of quantity "resizeQuantity"
            numOfConnsToCreate = resizeQuantity;
        } else if (ds.getResourcesSize() < maxPoolSize) {
            // This else if "test condition" is not needed. Just to be safe.
            // still few more connections (less than "resizeQuantity" and to reach the count of maxPoolSize)
            // can be added
            numOfConnsToCreate = maxPoolSize - ds.getResourcesSize();
        }
        if (numOfConnsToCreate > 0) {
             createResources(alloc, numOfConnsToCreate); **1
             result = getMatchedResourceFromPool(alloc); **2
        } else if (ds.getFreeListSize() > 0) {
            //pool cannot create more connections as it is at max-pool-size.
            //If there are free resources at max-pool-size, then none of the free resources
            //has matched this allocator's request (credential). Hence purge free resources
            //of size <=resizeQuantity
            if (purgeResources(resizeQuantity) > 0) {
                result = resizePoolAndGetNewResource(alloc);
            }
        }
        return result;
    }

    public ResourceHandle getResource(ResourceSpec spec, ResourceAllocator alloc, Transaction txn)
            throws PoolingException, RetryableUnavailableException  {
        //Note: this method should not be synchronized or the
        //      startTime would be incorrect for threads waiting to enter

       /*
        * Here are all the comments for the method put together for
        * easy reference.
        *  1.
           // - Try to get a free resource. Note: internalGetResource()
           // will create a new resource if none is free and the max has
           // not been reached.
           // - If can't get one, get on the wait queue.
           // - Repeat this until maxWaitTime expires.
           // - If maxWaitTime == 0, repeat indefinitely.

           2.
           //the doFailAllConnectionsProcessing method would already
           //have been invoked by now.
           //We simply go ahead and create a new resource here
           //from the allocator that we have and adjust the resources
           //list accordingly so as to not exceed the maxPoolSize ever
           //(i.e if steadyPoolSize == maxPoolSize )
           ///Also since we are creating the resource out of the allocator
           //that we came into this method with, we need not worry about
           //matching
        */
        ResourceHandle result = null;

        long startTime = System.currentTimeMillis();
        long elapsedWaitTime;
        long remainingWaitTime = 0;

        while (true) {
            if (gateway.allowed()) {
                //See comment #1 above
                JavaEETransaction jtx = ((JavaEETransaction) txn);
                Set resourcesSet = null;
                if(jtx != null){
                    resourcesSet = jtx.getResources(poolInfo);
                }
                //allow when the pool is not blocked or at-least one resource is
                //already obtained in the current transaction.
                if (!blocked || (resourcesSet != null && resourcesSet.size() > 0)) {
                    try {
                        result = internalGetResource(spec, alloc, txn);
                    } finally {
                        gateway.acquiredResource();
                    }
                }
            }
            if (result != null) {
                // got one, return it
                if (poolLifeCycleListener != null) {
                    poolLifeCycleListener.connectionAcquired(result.getId());
                    elapsedWaitTime = System.currentTimeMillis() - startTime;
                    poolLifeCycleListener.connectionRequestServed(elapsedWaitTime);
                    if (_logger.isLoggable( Level.FINE) ) {
                        _logger.log(Level.FINE, "Resource Pool: elapsed time " +
                                "(ms) to get connection for [" + spec + "] : " +
                                elapsedWaitTime);
                    }
                }
                //got one - seems we are not doing validation or matching
                //return it
                break;
            } else {
                // did not get a resource.
                if (maxWaitTime > 0) {
                    elapsedWaitTime = System.currentTimeMillis() - startTime;
                    if (elapsedWaitTime < maxWaitTime) {
                        // time has not expired, determine remaining wait time.
                        remainingWaitTime = maxWaitTime - elapsedWaitTime;
                    } else {
                        if (!blocked) {
                            // wait time has expired
                            if (poolLifeCycleListener != null) {
                                poolLifeCycleListener.connectionTimedOut();
                            }
                            String msg = localStrings.getStringWithDefault(
                                    "poolmgr.no.available.resource",
                                    "No available resource. Wait-time expired.");
                            ** throw new PoolingException(msg); **3
                        }
                    }
                }
caim2015 commented 3 years ago

Hi,can anyone give some thoughts on this issue?

zhanglx1987 commented 3 years ago

The jdbc connection pool still has space to create new connections, but the request failed because of this problem(failed to get connection). Do somebody have any ideas? Suggestions and comments would be greatly appreciated.

github-actions[bot] commented 2 years ago

This issue has been marked as inactive and old and will be closed in 7 days if there is no further activity. If you want the issue to remain open please add a comment

github-actions[bot] commented 1 year ago

This issue has been marked as inactive and old and will be closed in 7 days if there is no further activity. If you want the issue to remain open please add a comment

escay commented 7 months ago

I have written a unit test to try and reproduce this behavior.

About reported item 1 and 2 In my test the exact failure does not happen in 1 versus 2, but the lines right before it:

See attached test case: com.sun.enterprise.resource.pool.ConnectionPoolTest.basicConnectionPoolMultiThreadedTest() line 272 randomly fails if ConnectionPool code is not altered. I can reproduce this about 70% of the time with a maximum pool size of 5, a bit less when max pool size is 30, and quite often when maximum pool size is set to 1. Code: assertTrue(resources.size() <= maxConnectionPoolSize, "failed, size=" + resources.size());

Line 272 does not fail when I make sure that freeUnenlistedResource and getResourceFromPool are never called at the same moment. See hack in line 1106 of the attached altered ConnectionPool class and the console output. Undo the lock and the console output shows the random behaviour of adding resources to the pool, while threads should wait for one to become available in the pool because pool is maxed out. Code:

    // TODO: Replace this dirty fix, this fix shows the relation between 'freeResource' calls and 'getResourceFromPool' calls.
    // If these two are called at the same time, the connection pool can grow larger as the maximum connection pool size.
    private Object freeResourceObject = new Object();

Note: I also tested with the old /connectors-runtime/src/main/java/com/sun/enterprise/resource/pool/datastructure/ListDataStructure.java instead of the RWLockDataStructure. Both show the same behaviour.

Example log of succesful test with max pool size 5 and resize with 1:

resizePoolAndGetNewResource dataStructureSize: 0, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 1, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 2, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 3, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 4, numOfConnsToCreate: 1, reason: 2
Adding resource: 2
Adding resource: 1
Adding resource: 5
Adding resource: 3
Adding resource: 4
Reusing resource: 5
Reusing resource: 1
Reusing resource: 2
Reusing resource: 3
Reusing resource: 4
Reusing resource: 5
Reusing resource: 1
etc.

Failing output (without a fix / using current ConnectionPool code):

resizePoolAndGetNewResource dataStructureSize: 0, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 1, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 2, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 3, numOfConnsToCreate: 1, reason: 2
resizePoolAndGetNewResource dataStructureSize: 4, numOfConnsToCreate: 1, reason: 2
Adding resource: 2
Adding resource: 1
Adding resource: 5
Adding resource: 3
Adding resource: 4
Reusing resource: 5
Reusing resource: 1
Reusing resource: 2
Reusing resource: 3
Reusing resource: 4
Reusing resource: 5
Adding resource: 6
Reusing resource: 5
resizePoolAndGetNewResource dataStructureSize: 4, numOfConnsToCreate: 1, reason: 2
Error -> pool should not have been resized. Resource 6 should not have been added.

About reported item 3 Another thing I found in case of failing connections is: connections are freed when calling resourceErrorOccurred, BUT the waiting threads are not informed that a resource is removed from the pool. So these connections time out. See attached test case: com.sun.enterprise.resource.pool.ConnectionPoolTest.resourceErrorOccurredMultiThreadedTest() which now has to manually call connectionPool.notifyWaitingThreads(); This does lead to item **3 as reported in the original issue.

Branch with unit tests: https://github.com/eclipse-ee4j/glassfish/compare/master...escay:glassfish:issue_23483_unittest