YeoLab / gscripts

General Use Scripts and Helper functions
MIT License
18 stars 18 forks source link

RNAseq pipeline errors out with Java runtime environment error #72

Closed hjeanc closed 9 years ago

hjeanc commented 9 years ago

RNAseq pipeline errors out with the following error

INFO  18:41:14,855 QGraph - 660 Pend, 45 Run, 750 Fail, 4818 Done
parse_daemon_response error -2 (null)
Unable to communicate with tscc-mgr.local(10.1.1.1)
Unable to communicate with tscc-mgr.local(10.1.1.1)
Unable to communicate with tscc-mgr.local(10.1.1.1)
Unable to communicate with tscc-mgr.local(10.1.1.1)
WARN  18:41:47,453 DrmaaJobRunner - Unable to determine status of job id 2743471.tscc-mgr.local
org.ggf.drmaa.DrmCommunicationException: Batch protocol error
        at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:402)
        at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392)
        at org.broadinstitute.sting.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(JnaSession.java:156)
        at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner.liftedTree2$1(DrmaaJobRunner.scala:110)
        at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:109)
        at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
        at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
        at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:130)
        at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:275)
        at org.broadinstitute.sting.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:56)
        at org.broadinstitute.sting.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1130)
        at org.broadinstitute.sting.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1122)
        at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
        at scala.collection.immutable.List.foreach(List.scala:76)
        at org.broadinstitute.sting.queue.engine.QGraph.updateStatus(QGraph.scala:1122)
        at org.broadinstitute.sting.queue.engine.QGraph.runJobs(QGraph.scala:470)
        at org.broadinstitute.sting.queue.engine.QGraph.run(QGraph.scala:156)
        at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:171)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
        at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
        at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
+2+22+19+9hjclemons2+222743482.tscc-mgr.local2+122+10+9job_state+0+0+02+122+11exit_status+0+0+02+152+14resources_used+0+0+0+6+5ctime+0+0+0+6+5mtime+0+0+0+6+5qtime+0+0+0+6+5etime+0+0+0+6+5queue+0+0+02+132+12Account_Name+0+0+02+10+9exec_host+0+0+02+112+10start_time+0+0+0+6+5mtime+0+0+0+0#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00000033da0093a0, pid=27828, tid=47573091518208
#
# JRE version: OpenJDK Runtime Environment (7.0_75-b13) (build 1.7.0_75-mockbuild_2015_01_20_23_39-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.75-b04 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 2.5.4
# Distribution: Built on CentOS release 6.6 (Final) (Tue Jan 20 23:39:59 UTC 2015)
# Problematic frame:
# C  [libpthread.so.0+0x93a0]  pthread_mutex_lock+0x0
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
olgabot commented 9 years ago

looks like a segmentation fault to me: http://en.wikipedia.org/wiki/Segmentation_fault

I haven't seen this before but I'm not taht surprised because there's SOOO many samples. You'll need to kill your current interactive job and ask for a new one with more processors (more processors --> more RAM/memory). Let's do 4 just to be safe:

qsub -I -l walltime=168:00:00 -q home-scrm -l nodes=1:ppn=4

BTW edited your comment to make fenced code blocks around the error

hjeanc commented 9 years ago

Thanks! I tried but the back tics didn't work...

gpratt commented 9 years ago

Are you running on a processing node and not the head node? I've submitted 14k+ jobs before without issue. On May 8, 2015 10:58 AM, "hjeanc" notifications@github.com wrote:

Thanks! I tried but the back tics didn't work...

— Reply to this email directly or view it on GitHub https://github.com/YeoLab/gscripts/issues/72#issuecomment-100312647.

hjeanc commented 9 years ago

I am not running it on a head node. I am submitting an interactive job and then processing the data.

olgabot commented 9 years ago

re code formatting: Try to edit your original post to look at the new formatting. single backticks are for code within a sentence like this, but you use three backticks in a row to create a "fenced code block" which is what I did

On Fri, May 8, 2015 at 11:41 AM hjeanc notifications@github.com wrote:

I am not running it on a head node. I am submitting an interactive job and then processing the data.

— Reply to this email directly or view it on GitHub https://github.com/YeoLab/gscripts/issues/72#issuecomment-100323331.

gpratt commented 9 years ago

think this works now