Azure / azure-cosmosdb-java

Java Async SDK for SQL API of Azure Cosmos DB
MIT License
54 stars 61 forks source link

Direct TCP: Ensure that SslHandler is properly closed on connection failure #133

Closed David-Noble-at-work closed 5 years ago

David-Noble-at-work commented 5 years ago

Also: Ensure that RntbdClientChannelPool does not throw a NullPointerException when a channel closes while it is acquiring or releasing a channel in the pool. The code recovers from this, but the performance impact of throwing an exception in this piece of code is significant. Example: 7-8 vs 15-16 minutes to complete some of our long-running cross-partition query back pressure tests.

Specifics

Two issues are addressed with this change:

  1. When a connection drops due to read/write timeout there is no guarantee that the Netty SslHandler has released its SslEngine and ByteBuf resources. Specifically, we have seen that the SslHandler ByteToMessageDecoder implementation does not release ByteBuf memory before it is garbage collected. By implication it might also fail to release SslEngine resources:
 LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records: 
Created at
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:113)
io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1914)
io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1296)
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203)
io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247)
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:748)

Fix: Call SslHandler.closeOutbound from RntbdRequestManager.exceptionCaught when a connection fails. Expectation: SslHandler releases all resources when a channel closes normally.

  1. When a channel closes while RntbdClientConnectionPool is retrieving the channel's pending request count, a NullPointerException may be thrown.

    Fix: Ensure that the RntbdClientConnectionPool.getPendingRequestCount completes without exception.