Azure / azure-cosmosdb-java

Java Async SDK for SQL API of Azure Cosmos DB
MIT License
54 stars 61 forks source link

Cosmos AsyncDocumentClient close method does not properly free resources #88

Closed John20001 closed 5 years ago

John20001 commented 5 years ago

Describe the bug When repeatedly creating an AsyncDocumentClient and closing it, the closed AsyncDocumentClient is not freeing up its resources, particularly threads. The use case for this is constructing an AsyncDocumentClient with permissions, which expire, then closing that one and creating another one in its place.

To Reproduce The use case uses an AsyncDocumentClient with permission feed, however this can easily be reproduced with master key as well.

@Test
  public void testThreadPoolLeakForCosmos() throws Exception {
    AsyncDocumentClient ref;
    for(int i = 0; i < 100; i++) {
      System.out.println("Created client at: " + Instant.now().toString());
      ref = getMasterDocumentClient();
      Thread.sleep(2000);
      ref.close();
      Thread.sleep(2000);
    }
  }

  private static AsyncDocumentClient getMasterDocumentClient() {
    return new AsyncDocumentClient.Builder()
        .withServiceEndpoint(cosmosDBAccountTestController.getHost())
        .withConsistencyLevel(ConsistencyLevel.Strong)
        .withConnectionPolicy(ConnectionPolicy.GetDefault())
        .withMasterKeyOrResourceToken(cosmosDBAccountTestController.getPrimaryKey())
        .build();
  }

Expected behavior I expect the threads to increase when creating the first AsyncDocumentClient. Upon closing, I expect the threads to be released within a few seconds. I expect the number of threads to oscillate up and down by 2-3 threads as I create and close clients.

Actual behavior If you run the equivalent of the above code, and monitor the process with VisualVM, you will notice the threads climbing higher and higher. If you take a thread dump, you'll see netty threads in a wait. Here is one of many. Note that the pattern is rxnetty-nio-eventloop-{threadPoolId}-{threadId}. The threadId is only ever 1, but the threadPoolId keeps incrementing, indicating that each new AsyncDocumentClient makes a new thread pool (as expected), with one thread, but never cleans it up on close();.

"rxnetty-nio-eventloop-184-1" #221 daemon prio=5 os_prio=0 cpu=43.33ms elapsed=27.88s tid=0x00007f3dd811b800 nid=0x6786 runnable  [0x00007f3d9ee93000]
   java.lang.Thread.State: RUNNABLE
        at sun.nio.ch.EPoll.wait(java.base@11.0.1/Native Method)
        at sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.1/EPollSelectorImpl.java:120)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.1/SelectorImpl.java:124)
        - locked <0x0000000719ae8c38> (a io.netty.channel.nio.SelectedSelectionKeySet)
        - locked <0x0000000719ae8a10> (a sun.nio.ch.EPollSelectorImpl)
        at sun.nio.ch.SelectorImpl.select(java.base@11.0.1/SelectorImpl.java:136)
        at io.netty.channel.nio.SelectedSelectionKeySetSelector.select(SelectedSelectionKeySetSelector.java:62)
        at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:765)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:413)
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(java.base@11.0.1/Thread.java:834)

   Locked ownable synchronizers:
        - None

image

Environment summary SDK Version: 2.4.1 Java JDK version: 11 OS Version (e.g. Windows, Linux, MacOSX): Linux

Additional context This is a time critical issue, as we are in the middle of a major deployment featuring cosmos as the primary storage.

moderakh commented 5 years ago

We have a repro and we are working on a fix. @mbhaskar to follow up.

moderakh commented 5 years ago

This is fixed in 2.4.3