Closed mielientiev closed 8 years ago
I get OOM:
play.api.Application$$anon$1: Execution exception[[OutOfMemoryError: Direct buffer memory]] at play.api.Application$class.handleError(Application.scala:296) [play_2.10-2.3.9.jar:2.3.9] at play.api.DefaultApplication.handleError(Application.scala:402) [play_2.10-2.3.9.jar:2.3.9] at play.api.Application$class.handleError(Application.scala:291) [play_2.10-2.3.9.jar:2.3.9] at play.api.DefaultApplication.handleError(Application.scala:402) [play_2.10-2.3.9.jar:2.3.9] at play.core.server.netty.PlayDefaultUpstreamHandler$$anonfun$14$$anonfun$apply$1.applyOrElse(PlayDefaultUpstreamHandler.scala:205) [play_2.10-2.3.9.jar:2.3.9]
Caused by: java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_60]
at java.nio.DirectByteBuffer.
Please provide a pure AHC reproducer. No Play.
Sure. I've written a simple main application: https://gist.github.com/mielientiev/319b2c0d9794ff7cb854
Also I have the next FeedableBodyGenerator: https://gist.github.com/mielientiev/04e672692b0d230f7724 This is just a copy of org.asynchttpclient.request.body.generator.SimpleFeedableBodyGenerator (ConcurrentLinkedQueue was replaced by ArrayBlockingQueue with upper bound to prevent OOM)
Here is a profiling info:
Full stacktrace:
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658)
at java.nio.DirectByteBuffer.
Which version of AHC did you use? Did you check latest 2.0.0-alpha23?
I checked the 2.0.0-alpha22 version. I believe that 2.0.0-alpha23 has the same problem.
I tried the 2.0.0-alpha23. And now I get:
java.util.concurrent.TimeoutException: Read timeout to localhost/127.0.0.1:9000 of 60000 ms at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:48) at org.asynchttpclient.netty.timeout.ReadTimeoutTimerTask.run(ReadTimeoutTimerTask.java:57) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:581) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:655) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:367) at java.lang.Thread.run(Thread.java:745)
Here is a profiler info:
Could you share your server too, please? It looks to me that you no longer have the ByteBuf leak, and that you're just writing too fast for your server.
Here is a simple Play endpoint: https://gist.github.com/mielientiev/38fec15428e8cd0b6713 I use blocking queue (BlockingSimpleFeedableBodyGenerator) to prevent fast upload
I suspect your timeout issue comes from your blocking queue. You absolutely must never block Netty's IO threads!
But in this case I only block the main thread (producer of data). Blocking on the "feed" call
As I expected, the leak issue is already fixed in alpha23.
Your timeout is that you simply can't flush 10,000,000 * 8kB = 80GB in 2 minutes.
But after timeout buffers don't release
Here is profiler info after sending a 100,000 * 8kb data with 200 OK response
And I get the OOM error at alpha23 too. I just set logging level to WARN and it blows up my console. (added logback.xml)
12:47:24.208 [AsyncHttpClient-2-1] WARN i.n.channel.DefaultChannelPipeline - An exception was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_60]
at java.nio.DirectByteBuffer.
Here is why it didn't print anything before https://github.com/netty/netty/blob/4.0/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L258
I wrote a standalone pure Java sample and I really can't reproduce: https://gist.github.com/slandelle/b63f459d51b710fc8d91
Strange, I take your code and I have the same problem... here is the project: https://www.sendspace.com/file/6wxbmz
What's your OS and JDK version?
XUbuntu 15.04 and java 1.8.0_60-b27
Sorry, I really can't reproduce.
I took screen video. I don't know how it can help you, but still: https://youtu.be/27-5uz2xf9Y I added log4j dependency + log4.xml. And set 500_000 iteration
Here is debug info (may be it can help you to reproduce):
2015-11-14 00:06:58 DEBUG InternalLoggerFactory:71 - Using SLF4J as the default logging framework 2015-11-14 00:06:58 DEBUG ResourceLeakDetector:81 - -Dio.netty.leakDetection.level: simple 2015-11-14 00:06:58 DEBUG ResourceLeakDetector:81 - -Dio.netty.leakDetection.maxRecords: 4 2015-11-14 00:06:58 DEBUG MultithreadEventLoopGroup:76 - -Dio.netty.eventLoopThreads: 8 2015-11-14 00:06:58 DEBUG PlatformDependent0:76 - java.nio.Buffer.address: available 2015-11-14 00:06:58 DEBUG PlatformDependent0:76 - sun.misc.Unsafe.theUnsafe: available 2015-11-14 00:06:58 DEBUG PlatformDependent0:71 - sun.misc.Unsafe.copyMemory: available 2015-11-14 00:06:58 DEBUG PlatformDependent0:76 - java.nio.Bits.unaligned: true 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - Java version: 8 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - -Dio.netty.noUnsafe: false 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - sun.misc.Unsafe: available 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - -Dio.netty.noJavassist: false 2015-11-14 00:06:58 DEBUG PlatformDependent:71 - Javassist: available 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - -Dio.netty.tmpdir: /tmp (java.io.tmpdir) 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - -Dio.netty.bitMode: 64 (sun.arch.data.model) 2015-11-14 00:06:58 DEBUG PlatformDependent:76 - -Dio.netty.noPreferDirect: false 2015-11-14 00:06:58 DEBUG NioEventLoop:76 - -Dio.netty.noKeySetOptimization: false 2015-11-14 00:06:58 DEBUG NioEventLoop:76 - -Dio.netty.selectorAutoRebuildThreshold: 512 2015-11-14 00:06:58 DEBUG ThreadLocalRandom:71 - -Dio.netty.initialSeedUniquifier: 0xc0728fcb1386aeab (took 24 ms) 2015-11-14 00:06:58 DEBUG ByteBufUtil:76 - -Dio.netty.allocator.type: unpooled 2015-11-14 00:06:58 DEBUG ByteBufUtil:76 - -Dio.netty.threadLocalDirectBufferSize: 65536 2015-11-14 00:06:58 DEBUG ByteBufUtil:76 - -Dio.netty.maxThreadLocalCharBufferSize: 16384 2015-11-14 00:06:58 DEBUG NetUtil:86 - Loopback interface: lo (lo, 0:0:0:0:0:0:0:1%lo) 2015-11-14 00:06:58 DEBUG NetUtil:81 - /proc/sys/net/core/somaxconn: 128 2015-11-14 00:06:59 DEBUG JdkSslContext:76 - Default protocols (JDK): [TLSv1.2, TLSv1.1, TLSv1] 2015-11-14 00:06:59 DEBUG JdkSslContext:76 - Default cipher suites (JDK): [TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA, TLS_RSA_WITH_AES_128_GCM_SHA256, TLS_RSA_WITH_AES_128_CBC_SHA, SSL_RSA_WITH_3DES_EDE_CBC_SHA] 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.numHeapArenas: 8 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.numDirectArenas: 8 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.pageSize: 8192 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.maxOrder: 11 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.chunkSize: 16777216 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.tinyCacheSize: 512 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.smallCacheSize: 256 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.normalCacheSize: 64 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.maxCachedBufferCapacity: 32768 2015-11-14 00:06:59 DEBUG PooledByteBufAllocator:76 - -Dio.netty.allocator.cacheTrimInterval: 8192 2015-11-14 00:06:59 DEBUG AbstractByteBuf:81 - -Dio.netty.buffer.bytebuf.checkAccessible: true 2015-11-14 00:06:59 DEBUG JavassistTypeParameterMatcherGenerator:76 - Generated: io.netty.util.internal.matchers.io.netty.handler.codec.http.HttpObjectMatcher 2015-11-14 00:06:59 DEBUG NettyConnectListener:66 - Using non-cached Channel [id: 0xc3ae5be3, /127.0.0.1:38385 => localhost/127.0.0.1:9000] for POST '/' 2015-11-14 00:06:59 DEBUG Recycler:76 - -Dio.netty.recycler.maxCapacity.default: 262144 2015-11-14 00:06:59 DEBUG Cleaner0:76 - java.nio.ByteBuffer.cleaner(): available
Still no issue on my side. And my logs are identical. The only difference is that I run on OSX. I might be able to check on Ubuntu on Monday.
Are you sure you didn't change anything in the test, like trying to buffer the chunks? I ask because in your video, you get a LastHttpContent
(meaning that client has finished sending the body) when count reaches 269,000, which looks weird to me as you say that you ran 500,000 iterations.
here is the latest version: https://www.sendspace.com/file/w7odpu
Do you have any updates about this? I have the same problem on Windows too
No, I still can't reproduce. Does this happen also when running from the command line?
Try adding maven exec in your pom and run mvn compile exec:java
:
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.4.0</version>
<configuration>
<mainClass>test.Main</mainClass>
</configuration>
</plugin>
</plugins>
</build>
Please also provide what java -version
gives you (I want to make sure IntelliJ is not picking another JDK.
Wait, I think I finally get it. It don't think that's a memory leak. I think that's a the lack of back-pressure.
Basically, I think you're BlockingQueue
is useless, as Netty will keep on popping from the queue no matter how previous chunks could indeed be flushed to the TCP stack.
So, what can I do with this problem?
Mmmm, actually, I'm starting to consider this is a Netty bug. If I force to use heap buffers instead of native ones, the issue goes away (meaning that I see heap mem being properly GC'ed).
I'm going to switch the Netty's default allocator, instead of using the channel's one that is the direct one.
Then, I'll open an issue against Netty.
@normanmaurer FYI
Note: I'm comparing the number of chunked to be written on the client side, and the number of chunked read on the server one, and they match, so the issue is not chunks being buffered on the client side.
@mielientiev Could you please try latest master?
Note on previous commit (2a8e14709152e0a2de47f651a19d1ede82aaeefa): AHC still uses direct memory, but unpooled, instead of pooled. Memory leak goes away.
I've tested, and it seems to be work fine. Direct memory being properly GC'ed. Also I tested it with Play application and it works too.
Here is profiler info for test project that I've sent (AsyncTestProject)
Profiler info for Play App:
P.S. Tested on XUbuntu 15.04
Great, thanks for your feedback.
It will take me some time to come up with a pure Netty sample so I can open an issue against Netty. There's either an issue with Netty 4+ pooled buffers recycling, or with my way of using them.
I'll close this issue once I manage to have pooled buffers properly working. It would be great to have almost no GC there :)
@mielientiev I haven't exactly figured out what happens, but it has to do with 2 things:
I also noticed that plugging a monitoring tool such as JMC when the test is running greatly messes up the results.
Closing for now, as the original ByteBuffer leak issue is solved.
When I use custom FeedableBodyGenerator or SimpleFeedableBodyGenerator I see the next error message:
[error] i.n.u.ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. Enable advanced leak reporting to find out where the leak occurred. To enable advanced leak reporting, specify the JVM option '-Dio.netty.leakDetection.level=advanced' or call ResourceLeakDetector.setLevel() See http://netty.io/wiki/reference-counted-objects.html for more information.
I guess the main problem is here: https://github.com/netty/netty/blob/4.0/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L990-L992 they just assign null to message, but should also call 'release' (ReferenceCountUtil.release(msg);) Am I right?