Out of memory in a mapreduce job (when launched with profiler)

ibobak commented 9 years ago

I am not sure that this happens due to the profiler, but without the profiler this same job with the same input data worked fine. I see by stacktrace that CPUTraces.getDataToFlush is working and is interrupted. Don't you think that this is the reason for memory consumption?

Here is the screenshot of my memory parameters: http://postimg.org/image/8sdq8l6al/

2015-07-21 10:12:43,981 INFO [main] org.apache.gora.mapreduce.GoraRecordReader: gora.buffer.read.limit = 10000 2015-07-21 10:12:44,329 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:401) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:695) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

2015-07-21 10:12:44,383 ERROR [Thread-2] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Thread-2,5,main] threw an Throwable, but we are shutting down, so ignoring this java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) at java.util.HashMap$KeyIterator.next(HashMap.java:956) at com.etsy.statsd.profiler.util.CPUTraces.getDataToFlush(CPUTraces.java:45) at com.etsy.statsd.profiler.profilers.CPUProfiler.recordMethodCounts(CPUProfiler.java:116) at com.etsy.statsd.profiler.profilers.CPUProfiler.flushData(CPUProfiler.java:71) at com.etsy.statsd.profiler.worker.ProfilerShutdownHookWorker.run(ProfilerShutdownHookWorker.java:22) at java.lang.Thread.run(Thread.java:745)

ibobak commented 9 years ago

Andrew, could you please look into the CPUTraces class? I don't see there any code which clears the "traces" field. There is a line dirtyTraces.clear(); but I don't see any place where it would be called "traces.clear()" or something like this. Are you sure this is by design?

ajsquared commented 9 years ago

It's possible what you're profiling is producing a very large number of metrics and that's where the excessive memory usage is coming from. I would be surprised by it being that large, however. We generally use the packageWhitelist argument to filter down which traces are reported, but even with that turned off I couldn't reproduce your issue.

However, I have seen the ConcurrentModificationException before and I think I've tracked down the cause. I don't know that they are necessarily related, but I think I'll have a fix for that soon. I'll let you know when that's pushed and you could try it again.

ajsquared commented 9 years ago

The traces field is a leftover at this point. At one time there was the option to flush every trace regardless of whether or not it had been updated. This option is totally gone now, so it could be removed now.

ajsquared commented 9 years ago

However, even with the dirty traces logic gone, we still have to maintain a map of all the traces as we're incrementing the amount of time spent inside them. We can't know that a given trace is never going to appear again.

ajsquared commented 9 years ago

We could instead only maintain that sum for the period between flushes of data, which would require summing over the time range of profiling. That complicated some queries when visualizing the data using Graphite, but shouldn't be a big deal with InfluxDB.

ibobak commented 9 years ago

the application which I am working with is Apache Nutch. better to say, a fork of Apache Nutch. My colleagues did a web app which launches periodically "nutch ", and each nutch call submits mapreduce job in the cluster. Nutch is not a small application, I should say.

Currently I am doing one test: I will run the job WITHOUT profiler and will see if it will work. Then I will switch on the profiler and will see if it will crash. Before that I will clean all the InfluxDB database - to see how many records are put inside (I hope that when the app crashes, it will call the hook which will flush data to InfluxDB, will it?)

ajsquared commented 9 years ago

I don't think we can guarantee that the shutdown hook is called if an OOM occurs. I'm experimenting with the approach I described in my last comment, which if that works out should reduce the memory pressure from profiling.

ibobak commented 9 years ago

as I see in your code, you have just one flush (at the end of the app), right?

ajsquared commented 9 years ago

There is a shutdown hook which will flush when the JVM is exiting, but data will be flushed periodically during execution. For the CPU traces that is every 10 seconds.

ibobak commented 9 years ago

after removing _JAVA_OPTIONS from the environment and executing the application without the profiler, I see that everything is fine: 2015-07-21 22_34_29-win7 running - oracle vm virtualbox

Next step - I will add back the

_JAVA_OPTIONS='-javaagent:/var/lib/statsd-jvm-profiler-0.8.3-SNAPSHOT.jar=server=10.190.26.35,port=8086,reporter=InfluxDBReporter,database=profiler,username=profiler,password=profiler,prefix=bigdata.Ihor_Bobak.YARN.1.all.all,tagMapping=prefix.username.job.flow.stage.phase'

to the environment....

ajsquared commented 9 years ago

Okay, I've pushed the change to clear the stored traces when data is flushed and the corresponding change to the dashboard to sum over the all the values to account for the difference in how metrics are flushed.

ibobak commented 9 years ago

thank you! could you please change the influxdb_dump.py ?

Because I really need it very much. I need svg.

ajsquared commented 9 years ago

Just pushed that as well!

ibobak commented 9 years ago

thanks. compiling new version....

ibobak commented 9 years ago

'select value from /^cpu.trace.*/ where %s limit 1'

seems to be a bug - do we need "limit 1" ?

ajsquared commented 9 years ago

Good catch, that snuck in from something I was testing. It's gone now.

ibobak commented 9 years ago

Andrew, I added profiling (copied the newly built jar file to EVERY node - carefully did this, no mistakes), and unfortunately I am again receiving

2015-07-21 13:01:52,175 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.gora.mapreduce.GoraInputSplit@5f96f26e 2015-07-21 13:01:52,186 INFO [main] org.apache.gora.mapreduce.GoraRecordReader: gora.buffer.read.limit = 10000 2015-07-21 13:01:52,513 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:401) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:695) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

and this is just the second job 2015-07-21 23_04_16-win7 running - oracle vm virtualbox

ajsquared commented 9 years ago

I believe the _JAVA_OPTIONS environment variable will override other heap size settings, so you might be ending up with a smaller heap size than you expect. Could you try adding an appropriate -Xmx setting in that environment variable along with the profiling options?

ibobak commented 9 years ago

I added -Xmx4096m to the environment 2015-07-21 23_43_53-win7 running - oracle vm virtualbox

got this

2015-07-21 13:51:09,514 ERROR [Thread-2] org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread Thread[Thread-2,5,main] threw an Exception. java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextEntry(HashMap.java:922) at java.util.HashMap$EntryIterator.next(HashMap.java:962) at java.util.HashMap$EntryIterator.next(HashMap.java:960) at java.util.HashMap.putAllForCreate(HashMap.java:554) at java.util.HashMap.(HashMap.java:298) at com.google.common.collect.Maps.newHashMap(Maps.java:138) at com.etsy.statsd.profiler.util.CPUTraces.getDataToFlush(CPUTraces.java:41) at com.etsy.statsd.profiler.profilers.CPUProfiler.recordMethodCounts(CPUProfiler.java:114) at com.etsy.statsd.profiler.profilers.CPUProfiler.flushData(CPUProfiler.java:71) at com.etsy.statsd.profiler.worker.ProfilerShutdownHookWorker.run(ProfilerShutdownHookWorker.java:22) at java.lang.Thread.run(Thread.java:745)

ibobak commented 9 years ago

the first job did not even start (contrary to the first case, when it worked fine but the second job failed)

ibobak commented 9 years ago

Andrew, thank you very much for your time. I am "switching off" - need to go home to sleep. I will apply all further ideas tomorrow. Thanks again.

ajsquared commented 9 years ago

I think I know what's going on with that error. I'll have to play around with it a little.

ibobak commented 9 years ago

feel free to send me several versions of the jar files to ibobak at gmail dot com, or make several branches in git - like "test1", "test2", etc. - I will make jars for them and will try all of them in my cluster.

(I would gladly let you in into this environment, but I am behind the NDA with the customer)

thank you for everything. I will be online in 10 hours

ajsquared commented 9 years ago

I just pushed a "testing" branch that I think will avoid the ConcurrentModificationException. Thanks for your patience and feedback in trying this out!

ibobak commented 9 years ago

Hi, Andrew. I've made a build on testing branch, pushed to all servers, made sure that md5sum is the same as of the build. I even opened the built jar with java decompiler - to make sure that changes are included. In fact, they are. So the binary is with your changes in CPUTraces.java.

However, I am still getting this exception. If you look at the exception code, you may notice that this is still in the beginning of the mapper execution - it does GoraInputSplit. I don't know if this information is helpful or not, but the problem still persists...

2015-07-23 05:30:11,563 INFO [main-EventThread] org.apache.zookeeper.ClientCnxn: EventThread shut down 2015-07-23 05:30:11,565 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: org.apache.gora.mapreduce.GoraInputSplit@707730b7 2015-07-23 05:30:11,575 INFO [main] org.apache.gora.mapreduce.GoraRecordReader: gora.buffer.read.limit = 10000 2015-07-23 05:30:11,891 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:401) at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:81) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:695) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

ibobak commented 9 years ago

if I put -Xmx4096m into _JAVA_OPTIONS, I am getting a failure to run the container by YARN:

Application application_1437655238148_0001 failed 2 times due to AM Container for appattempt_1437655238148_0001_000002 exited with exitCode: -104 due to: Container [pid=4944,containerID=container_1437655238148_0001_02_000001] is running beyond physical memory limits. Current usage: 553.0 MB of 512 MB physical memory used; 4.8 GB of 1.0 GB virtual memory used. Killing container. Dump of the process-tree for container_1437655238148_0001_02_000001 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 4944 4942 4944 4944 (bash) 0 0 108654592 297 /bin/bash -c /usr/java/jdk1.7.0_67-cloudera/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1437655238148_0001/container_1437655238148_0001_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Djava.net.preferIPv4Stack=true -Xmx825955249 org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1>/var/log/hadoop-yarn/container/application_1437655238148_0001/container_1437655238148_0001_02_000001/stdout 2>/var/log/hadoop-yarn/container/application_1437655238148_0001/container_1437655238148_0001_02_000001/stderr |- 4952 4944 4944 4944 (java) 2470 294 5080645632 141267 /usr/java/jdk1.7.0_67-cloudera/bin/java -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/var/log/hadoop-yarn/container/application_1437655238148_0001/container_1437655238148_0001_02_000001 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog -Djava.net.preferIPv4Stack=true -Xmx825955249 org.apache.hadoop.mapreduce.v2.app.MRAppMaster Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 Failing this attempt.. Failing the application.

ajsquared commented 9 years ago

Interesting, based on that stack trace it looks like it's not getting the larger memory setting. I'd expect the physical memory limit to be 4096 MB, not 512 MB with that setting in place.

ibobak commented 9 years ago

I started investigating your code, and I see some non optimal places which may produce OOM. I will first apply them and try locally, and then will send some patches here.

ibobak commented 9 years ago

Andrew, you have scheduledExecutorService.scheduleAtFixedRate(worker, EXECUTOR_DELAY, profiler.getPeriod(), profiler.getTimeUnit()));

CPUProfiler.java:

@Override public long getPeriod() { return PERIOD; // will return 1 }

@Override
public TimeUnit getTimeUnit() {
    return TimeUnit.MILLISECONDS;  // millisecond !
}

Question: are you sure you need to do getting stacktrace each millisecond?

ajsquared commented 9 years ago

Not necessarily, but that does provide better granularity.

ibobak commented 9 years ago

Andrew, I completely fixed out of memory errors and made several other changes. I made a fork - https://github.com/ibobak/statsd-jvm-profiler - feel free to get from here all the changes which you would like to have in your product. I added my comments inside readme files and inside the Java code.

As to python extractor, I re-wrote it significantly in order to make so that it outputs separate files for every JVM process that was launched in the cluster. I am open to answer any questions - just add me in skype (id = ibobak), if you wish to talk. I have GMT+3

Thank you for everything, thanks for your support and for excellent product that you made.

ajsquared commented 9 years ago

Thanks, I really appreciate the contribution! I will look over your fork.

ibobak commented 9 years ago

it would be very fine if you could add support of jvmName in your version, and if you could change your dashboard so that it displays the dropdown with the jvm processes. I found that finding a total sum of all stacktraces over all processes is not good: I have lots of jobs in the cluster, they are hadoop jobs (so I cannot parameterize the prefixes - I CANNOT MODIFY THAT HADOOP CODE), but I need somehow on the server to know what is what.

ajsquared commented 9 years ago

That's definitely one of the things I'm interested in adding.

ibobak commented 8 years ago

Hi, Andrew. Credits are given to you here at 47:29 in my presentation at meetup https://www.youtube.com/watch?v=zE1ri30L8nc

Thank you for your help and for this excellent product.

By the way: I've totally rebuild influxdb_dump.py in my fork - feel free to use.

etsy / statsd-jvm-profiler

Out of memory in a mapreduce job (when launched with profiler) #27