eclipse-ee4j / tyrus

Tyrus
Other
115 stars 37 forks source link

Gradually Increasing Memory Usage with Tyrus Standalone Client 1.11 #585

Open glassfishrobot opened 9 years ago

glassfishrobot commented 9 years ago

Setup: I have a long running Java 8 service that establishes several persistent websocket connections to a Tomcat 7 websocket server. Messages are sent and received at a fairly slow, consistent rate (3-15 every 15 seconds). The websocket connections disconnect and reconnect about every 10 minutes. I have set the max heap size to 40Mb, just to more efficiently troubleshoot OutOfMemory errors.

Problem: Eventually (after about 2 days), the java process begins throwing java.lang.OutOfMemoryError: GC overhead limit exceeded.

Monitoring the rss value of the process via the ps command shows the memory usage of the process increasing gradually (~2k/minute).

Monitoring the memory usage with jstat shows old generation usage increase gradually until it hits the limit. When it does, a full garbage collection happens and cleans up most of it, but each time it leaves a small amount. Eventually it reaches the GC limit threshold and throws the above error.

Analyzing the heap with jmap/jhat, and viewing instance counts shows 4 at the very top (in this order): org.glassfish.grizzly.http.util.BufferChunk, org.glassfish.grizzly.http.util.ByteChunk, org.glassfish.grizzly.http.util.CharChunk, org.glassfish.grizzly.http.util.DataChunk. These instances only get cleaned up when a full garbage collection happens, but each time it leaves some of them.

Viewing the heap histogram shows instances of [B consuming very large amounts (~50%) of the max heap size.

This seems similar to GRIZZLY-84, but that is marked as already fixed.

Environment

Tyrus Standalone Client 1.11 Java 8 (-Xmx40M) Debian Jessie Tomcat 7

Affected Versions

[1.11]

glassfishrobot commented 6 years ago
glassfishrobot commented 9 years ago

@glassfishrobot Commented Reported by spstur

glassfishrobot commented 9 years ago

@glassfishrobot Commented @pavelbucek said: Hi spstur,

thanks for filing this.. interesting issue. Can you please share the code you have for reproducing the issue (minimal reproducer would speed things up greatly). From what you see it could be issue in Grizzly layer, but we can communicate that to them when this is verified.

Also, could you try to use Tyrus JDK client transport? (that does not use Grizzly at all - minimum requirement for this is JDK 7+, which should be ok, since you've indicated that you are using JDK 8).

See https://tyrus.java.net/documentation/1.11/user-guide.html#d0e1331 for more info about using jdk client transport.

Thanks, Pavel

glassfishrobot commented 9 years ago

@glassfishrobot Commented spstur said: I switched it over to use the JDK client transport. When analyzing the heap I no longer see any grizzly classes (as you'd expect), but I still see the same problems with memory usage, and now the instance count from the heap shows over 10k instances of class org.glassfish.tyrus.core.coder.CoderWrapper, after running for about 20 hours.

I'm working on a minimal reproducer at the moment, I have a number of other tasks taking precendence but I hope to have that to you in a couple weeks at the most. When I do I'll also try to provide more information about what exactly I'm seeing with the JDK client transport.

glassfishrobot commented 9 years ago

@glassfishrobot Commented thomascashman said: I have a tyrus websocket client sending 20k messages/second to a tyrus websocket server in another application. I'm experiencing the same pattern of garbage collection and pauses. YourKit showed that the method that's creating the most objects is Async#sendText - can't inspect deeper than that for some reason. The client application dies very quickly - within 1 minute.

However, I've attempted to reproduce this (https://github.com/tomcashman/tyrus-408) in a simple tyrus client/server and it does not die as quickly but can see the old generation slowly increasing.

glassfishrobot commented 8 years ago

@glassfishrobot Commented @PetrJanouch said: I have went through a couple of scenarios and I cannot reproduce this. To answer to the comment:

@thomascashman: Thanks for the reproducer. I run it for half an hour, made a heap dump at the beginning and at the and compared the heap dumps (and repeated twice, to be sure). I have rarely seen heap dumps with so little difference. In my view increasing old generation does not mean a memory leak, it might mean that some cached (Grizzly loves caching) and long lived objects get promoted. I am not saying there is not memory leak in your application, just that the reproducer does not reproduce it. You can try the same using jvisualvm (part of JDK). It allows comparing heap dumps.

I have tried my own reproducer, which creates a lot of short lived connections. I have found out the following. I managed to simulate a leak of org.glassfish.grizzly.http.util.BufferChunk, org.glassfish.grizzly.http.util.ByteChunk, org.glassfish.grizzly.http.util.CharChunk, org.glassfish.grizzly.http.util.DataChunk objects if websocket Session is not closed (Either by calling Session#close locally or receiving close frame from the server) or if references to Session are kept after it has been closed. If the Session is closed and references to it not kept around everything seems OK.

I cannot do more without more information. Either the promised reproducer or a heap dump would be helpful.

glassfishrobot commented 7 years ago

@glassfishrobot Commented This issue was imported from java.net JIRA TYRUS-408