colinsurprenant / redstorm

JRuby on Storm
Other
298 stars 56 forks source link

java.lang.OutOfMemoryError: PermGen space #92

Open Rockyyost opened 10 years ago

Rockyyost commented 10 years ago

I keep running into this issue. After a few seconds of running the Topology in local mode, red storm errors out and give me this: java.lang.OutOfMemoryError: PermGen space

I don't think I've got that much data running through this yet, is there something I need to do remove old tuples or something to keep this down?

Thanks for your help!

colinsurprenant commented 10 years ago

Frankly I have not witnessed OutOfMemoryError yet. It is very hard to diagnose the cause of your OutOfMemoryError without analyzing your topology. Are you using "reliable" emit without acknowledging your tuples? If memory is tight and your tuples rate is high that could trigger it. Which JRuby version are you using?

Rockyyost commented 10 years ago

I'm not using reliable emit and in all my bolts I have on_receive :emit => false, :ack => true, plus I manually emit tuples that should go to the next stream and ack those that should not.

I think, however, that I've narrowed it down to Mongoid. When I comment-out any updates or saves (it's mainly on updates as I don't have that many saves) things flow through without issue.

Do you have any experience with using Mongoid and Redstorm together?

Thanks for your help!

colinsurprenant commented 10 years ago

Ok. That should not impact memory but you shouldn't have to ack tuples that are not emitted using "reliability".

No, I haven't used Mongoid.

What is your JVM max heap size? (-Xmx)

What JVM & JRuby version are you using?

Colin

Rockyyost commented 10 years ago

My JVM max heap size is Xmx2024m And I'm using Java 1.7.0_45 with jruby-1.7.8

Thanks, Rocky

colinsurprenant commented 10 years ago

Well, there's nothing obviously fishy, so it's kind of hard to help diagnose the source of the problem. From what you describe, our best shot is with Mongoid. Could it be related to the identity map stuff?(http://mongoid.org/en/mongoid/docs/identity_map.html). There seems to be lots of memory related issues with that.

Otherwise, you could create a minimal topology which reproduces the problem and I'd be happy to take a look.

Rockyyost commented 10 years ago

I'm not using identity map, but I tried using Moped instead (Mongoid uses Moped) and it works! I'd ideally like to use Mongoid, so I'll reach out to them to see if they might have any insights, but at least I can move forward with Moped.

Thanks for all your help Colin!

colinsurprenant commented 10 years ago

right. but seriously if you can craft a minimal topology which reproduces the problem i'd really like to investigate a bit because given the popularity on Mongo, it will probably happen again!

Rockyyost commented 10 years ago

Okay, I will. But I don't think it's Mongoid anymore, because now the same thing is happening when I try to use Net::HTTP. I think there's something fundamental I'm not getting or haven't set right.

With one run of the topology, after a few seconds of it working, I'll get some of these: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main-EventThread"

A few of these: Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "Thread-21"

And these: ERROR com.netflix.curator.ConnectionState - Connection timed out

And not to be out done, these are the error messages I catch in my rescue statement after using either Mongoid (before moving to Moped ) or Net::HTTP:

identity=Lambda(a0:L,a1:L)=>{ t2:L=Species_L.argL0(a0:L); t3:L=ValueConversions.identity(t2:L);t3:L}

And:

guard=Lambda(a0:L,a1:L,a2:L,a3:L,a4:L)=>{ t5:I=MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)boolean(a1:L,a2:L,a3:L,a4:L); t6:L=MethodHandleImpl.selectAlternative(t5:I,(MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)IRubyObject),(MethodHandle(ThreadContext,IRubyObject,IRubyObject,IRubyObject)IRubyObject)); t7:L=MethodHandle.invokeBasic(t6:L,a1:L,a2:L,a3:L,a4:L);t7:L}

I might not actually be properly setting the memory or disabling the Pergem stuff. I've tried to by exporting JRUBY_OPTS and JAVA_OPTS with options like:

-J-XX:+CMSPermGenSweepingEnabled -J-XX:+CMSClassUnloadingEnabled -Xcompile.invokedynamic=false -J-Xmx50024m -J-Xms1024m -J-XX:PermSize=1512m -J-XX:MaxPermSize=42024m

I'll see what I can do to get a version of my topology to you.

Thanks! Rocky

Rockyyost commented 10 years ago

Hey Colin, I've got a simple-ish Topology for you to look at. I put Mongoid back in too. Because so much of this depends on the data we have, I've had to create a smallish development database for this, with the needed data pre-loaded, to avoid having you mess with setting things up, I've included a mongoid.yml file too.

What's the best way to get this to you?

Thanks!

Rockyyost commented 10 years ago

Do you know how I'd increase the redstorm.TopologyLauncher's PermGen? It seems to be set at 82m currently, and while I can set the JVM's and JRuby's PermGen, it doesn't seem to effect redstorm.TopologyLauncher. I was using Visual GC to see if I can spot what was taking up what and I noticed that all of my attempts at setting the PermGen JRuby application that starts first and I think that's what starts the redstorm.TopologyLauncher's PermGen.

Thanks!

colinsurprenant commented 10 years ago

So, you could gist the files, and for the data, on dropbox?

Also, did you take a look at the Storm UI

Right, so, the PermGen setting is global to the JVM and thus your JRuby code will run in the context of the JVM which can be configured using the topology.worker.childopts options. You can set JVM options in your topology configuration with something like this for example:

configure do
  set "topology.worker.childopts", "-Xms256m -Xmx1024m -XX:PermSize=512m -XX:MaxPermSize=512m"
end
colinsurprenant commented 10 years ago

Also, I forgot to ask, did you use the Storm UI to see if tuples were getting backed up "behind" a slower bolt? Depending on your max_spout_pending setting, this could explain memory filling up.

Rockyyost commented 10 years ago

Hey Colin! Sorry I haven't gotten back to you sooner. As a hunch, I ran my topology under Java 1.8 and worked fine. It know it's only in beta and still need to figure my issue out, but for the time being, I was able to continue my work.

I will get a topology to you soon.

Is the SnakeYAML still an issue? Also, I now have a test Storm cluster up, and I tested it by pushing one of your example and the Storm UI doesn't show any activity. Is there something I need to do once publish the topology to the server?

Here's the link if you wanna take look: http://108.166.114.70:8080

Again, thank so much for your help!

colinsurprenant commented 10 years ago

hey, reviewing pending issues, where are we with regard to this? let me know if you are still having issues.

Rockyyost commented 10 years ago

I upgraded to Java 8. That to care of the problem. I haven't tried again on 7 and I don't really plan to, since it's working really well now.

Thanks!

Sent from my iPad

On Apr 14, 2014, at 8:21 PM, Colin Surprenant notifications@github.com wrote:

hey, reviewing pending issues, where are we with regard to this? let me know if you are still having issues.

— Reply to this email directly or view it on GitHub.

aliekens commented 9 years ago

I just started bumping into this exact same problem, but not related to Mongoid. Apparently, the JVM keeps track of all instantiated classes, forever. With dynamically generated classes (in JRuby or Groovy or others), all these classes defined at runtime incrementally take up space in the "PermGen" space, thus leaking memory until the "java.lang.OutOfMemoryError: PermGen space" effectively kills the topology.

This thread and answer on StackOverflow discusses this problem and possible solutions to turn on garbage collection in this memory space.

Unfortunately, with JVM 1.7, I can't pass the suggested GC options to the JVM through JRuby ("jruby: invalid extended option X:+CMSClassUnloadingEnabled").

aliekens commented 9 years ago

Moving to Java 1.8 resolved my issue as well.