Also: Ensure that RntbdClientChannelPool does not throw a NullPointerException when a channel closes while it is acquiring or releasing a channel in the pool. The code recovers from this, but the performance impact of throwing an exception in this piece of code is significant. Example: 7-8 vs 15-16 minutes to complete some of our long-running cross-partition query back pressure tests.
Specifics
Two issues are addressed with this change:
When a connection drops due to read/write timeout there is no guarantee that the Netty SslHandler has released its SslEngine and ByteBuf resources. Specifically, we have seen that the SslHandlerByteToMessageDecoder implementation does not release ByteBuf memory before it is garbage collected. By implication it might also fail to release SslEngine resources:
LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
Created at
io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:331)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185)
io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176)
io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:113)
io.netty.handler.ssl.SslHandler.allocate(SslHandler.java:1914)
io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1296)
io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1203)
io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1247)
io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:502)
io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:441)
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:656)
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:591)
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:508)
io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:470)
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
java.lang.Thread.run(Thread.java:748)
Fix: Call SslHandler.closeOutbound from RntbdRequestManager.exceptionCaught when a connection fails. Expectation: SslHandler releases all resources when a channel closes normally.
When a channel closes while RntbdClientConnectionPool is retrieving the channel's pending request count, a NullPointerException may be thrown.
Fix: Ensure that the RntbdClientConnectionPool.getPendingRequestCount completes without exception.
Also: Ensure that
RntbdClientChannelPool
does not throw aNullPointerException
when a channel closes while it is acquiring or releasing a channel in the pool. The code recovers from this, but the performance impact of throwing an exception in this piece of code is significant. Example: 7-8 vs 15-16 minutes to complete some of our long-running cross-partition query back pressure tests.Specifics
Two issues are addressed with this change:
SslHandler
has released itsSslEngine
andByteBuf
resources. Specifically, we have seen that theSslHandler
ByteToMessageDecoder
implementation does not releaseByteBuf
memory before it is garbage collected. By implication it might also fail to release SslEngine resources:Fix: Call
SslHandler.closeOutbound
fromRntbdRequestManager.exceptionCaught
when a connection fails. Expectation:SslHandler
releases all resources when a channel closes normally.When a channel closes while
RntbdClientConnectionPool
is retrieving the channel's pending request count, a NullPointerException may be thrown.Fix: Ensure that the
RntbdClientConnectionPool.getPendingRequestCount
completes without exception.