frazer-lab / cluster

Repo for cluster issues.
1 stars 0 forks source link

Genome STRiP Troubleshooting Log #72

Closed djakubosky closed 8 years ago

djakubosky commented 8 years ago

I'm attempting to get a tool called GenomeSTRiP (2.01) to work with SGE (it is designed to do so)

See here http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_CNVDiscoveryPipeline.html

and for the QUEUE command line see here http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_QCommandLine.html

I get an error when I run the CNV discovery pipeline that looks like this java.lang.UnsatisfiedLinkError: Unable to load library 'drmaa': libdrmaa.so: cannot open shared object file: No such file or directory

Was wondering if you had any thoughts on why this might be, how to point this tool to our implementation of SGE

Thanks! David

djakubosky commented 8 years ago

Note my discovery command looks like this 1 SV_DIR="/frazer01/home/djakubosky/software/svtoolkit" 2 classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar" 3 echo $classpath 4 java -Xmx4g -cp ${classpath} \ 5 org.broadinstitute.gatk.queue.QCommandLine \ 6 -S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \ 7 -S ${SV_DIR}/qscript/SVQScript.q \ 8 -cp ${classpath} \ 9 -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ 10 -configFile ${SV_DIR}/conf/genstrip_parameters.txt \ 11 -I /frazer01/home/djakubosky/Test_Data/full_bams.list \ 12 -R /frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta \ 13 -genderMapFile /frazer01/home/djakubosky/bin/gender_list.txt \ 14 -md /frazer01/home/djakubosky/Genome_Strip2 \ 15 -runDirectory /frazer01/home/djakubosky/GS_RD \ 16 -jobLogDir /frazer01/home/djakubosky/GS_RD/logs \ 17 -jobRunner Drmaa \ 18 -jobQueue short.q \ 19 -produceAuxiliaryFiles \ 20 -tilingWindowSize 1000 \ 21 -tilingWindowOverlap 500 \ 22 -maximumReferenceGapLength 1000 \ 23 -boundaryPrecision 100 \ 24 -minimumRefinedLength 500 \ 25 -run ~

tatarsky commented 8 years ago

Sounds like its a DRMAA based tool. There is a SGE specific DRMAA library in:

/opt/sge/lib/lx-amd64

Might be as simple as adding to your LD_LIBRARY_PATH. But I'd have to look at it and unlikely today.

/opt/sge/lib/lx-amd64
-rwxr-xr-x 1 root root 1872083 Oct 28 20:30 libdrmaa.so.1.0
lrwxrwxrwx 1 root root      15 Oct 28 20:30 libdrmaa.so -> libdrmaa.so.1.0
djakubosky commented 8 years ago

I have found that advice elsewhere. I'll trying adding that to the LD_Library_Path and let you know what happens- If I'm still failing I will advise.

Thanks!! David

tatarsky commented 8 years ago

We can add it globally as well but check first if it helps. Sometimes tools want other DRMAA libraries but thats the one that comes with SGE.

djakubosky commented 8 years ago

Appears to be working and submitting jobs to the cluster now with that in the LD_LIBRARY_PATH

tatarsky commented 8 years ago

Ah cool! I will add it to the system ld.conf.so search path then so you do not have to remember to do that. I will do that a bit later today as I'm heading to get my wife from work.

djakubosky commented 8 years ago

Putting this here to document things I'm trying and to keep everyone on the same page

export LD_LIBRARY_PATH="/opt/sge/lib/lx-amd64/" enables the submission of jobs through Queue system to SGE however we get a samtools error, because the environment on the compute node doesn't inherit my own through Queue

this line must be added to the command -jobNative "-V -cwd" http://gatkforums.broadinstitute.org/discussion/6354/unable-to-load-library-drmaa-libdrmaa-so-error

Examining output from this command I see the script occasionally still trying to call lsf706 job runner and I don't know why

/frazer01/home/djakubosky/bin/Error_Disc2

see this file for output

djakubosky commented 8 years ago

We also need to remember to specify something called GATKjobrunner like so

-gatkJobRunner Drmaa

tatarsky commented 8 years ago

I won't get to this tonight due to my rule that I don't change things Friday but I will let you know when you will no longer need the LD_LIBRARY_PATH (and the -V passing of it) to the environment because I will add it to the default ldconfig cache path.

I will look at other output once I do the above but will not be tonight.

djakubosky commented 8 years ago

New error "unable to submit job", not totally sure why (submits jobs earlier in pipeline)

ERROR 15:53:58,515 Retry - Caught error during attempt 1 of 4. 542 org.broadinstitute.gatk.queue.QException: Unable to submit job: denied: host "fl-n-1-5" is not submit host 543 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply$mcV$sp(DrmaaJobRunner.scala:89) 544 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:85) 545 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:85) 546 at org.broadinstitute.gatk.queue.util.Retry$.attempt(Retry.scala:49) 547 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.start(DrmaaJobRunner.scala:85) 548 at org.broadinstitute.gatk.queue.engine.FunctionEdge.start(FunctionEdge.scala:84) 549 at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:434) 550 at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156) 551 at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:171) 552 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) 553 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) 554 at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62) 555 at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)

tatarsky commented 8 years ago

Why are you submitting jobs from a node:

Unable to submit job: denied: host "fl-n-1-5" is not submit host

Do you need this? I dimly recall asking if people need to submit from nodes but do not recall the answer. Its not a big deal to add but I usually check if its a typo or really needed.

djakubosky commented 8 years ago

I'm not, jobs are submitted by Queue workflow manager- I start the pipeline from hn1, and somewhere down the line it appears to try to submit from a node, maybe when its recursively running jobs

tatarsky commented 8 years ago

OK. I just added to all nodes the ability to submit jobs. Thats a safe one for a Friday night as its just a status bit ;) Try again.

djakubosky commented 8 years ago

Thanks! Fingers crossed haha

djakubosky commented 8 years ago

Well, it got further than before and spawned many jobs across the cluster

it dies complaining that is can't create the java virtual machine

RROR 16:40:29,900 FunctionEdge - Error: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tm pdir=/frazer01/home/djakubosky/bin/.queue/tmp' '-cp' '/frazer01/home/djakubosky/software/svtoolkit/lib/SVToolkit.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Geno meAnalysisTK.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVGenotyper '-T' 'SVGenotyperWalker' '-R' '/frazer01/home/djakubos ky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta' '-I' '/frazer01/home/djakubosky/GS_RD/bam_headers/merged_headers.bam' '-O' '/frazer01/home/djakubosky/G S_RD/cnv_stage2/seq_12/P0042.genotypes.vcf.gz' '-disableGATKTraversal' 'true' '-md' '/frazer01/home/djakubosky/Genome_Strip2' '-configFile' '/frazer01/home/djakubosky/softwa re/svtoolkit/conf/genstrip_parameters.txt' '-P' 'genotyping.modules:depth' '-P' 'depth.readCountCacheIgnoreGenomeMask:true' '-runDirectory' '/frazer01/home/djakubosky/GS_RD/c nv_stage2/seq_12' '-genderMapFile' '/frazer01/home/djakubosky/bin/gender_list.txt' '-ploidyMapFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapie ns_assembly19.ploidymap.txt' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/frazer 01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.lcmask.fasta' '-vcf' '/frazer01/home/djakubosky/GS_RD/cnv_stage1/seq_12/seq_12.sites.vcf.gz' ' -partitionName' 'P0042' '-partition' 'records:41001-42000' 798 ERROR 16:40:29,907 FunctionEdge - Contents of /frazer01/home/djakubosky/GS_RD/cnv_stage2/seq_12/logs/CNVDiscoveryStage2-42.out: 799 Error occurred during initialization of VM 800 Could not reserve enough space for object heap 801 Error: Could not create the Java Virtual Machine. 802 Error: A fatal exception has occurred. Program will exit. @

cdeboever3 commented 8 years ago

It seems like it's not asking for enough memory from SGE. Is there any control over that?

On Fri, Dec 4, 2015 at 4:57 PM, djakubosky notifications@github.com wrote:

Well, it got further than before and spawned many jobs across the cluster

it dies complaining that is can't create the java virtual machine

RROR 16:40:29,900 FunctionEdge - Error: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tm pdir=/frazer01/home/djakubosky/bin/.queue/tmp' '-cp' '/frazer01/home/djakubosky/software/svtoolkit/lib/SVToolkit.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Geno meAnalysisTK.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVGenotyper '-T' 'SVGenotyperWalker' '-R' '/frazer01/home/djakubos ky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta' '-I' '/frazer01/home/djakubosky/GS_RD/bam_headers/merged_headers.bam' '-O' '/frazer01/home/djakubosky/G S_RD/cnv_stage2/seq_12/P0042.genotypes.vcf.gz' '-disableGATKTraversal' 'true' '-md' '/frazer01/home/djakubosky/Genome_Strip2' '-configFile' '/frazer01/home/djakubosky/softwa re/svtoolkit/conf/genstrip_parameters.txt' '-P' 'genotyping.modules:depth' '-P' 'depth.rea dCountCacheIgnoreGenomeMask:true' '-runDirectory' '/frazer01/home/djakubosky/GS_RD/c nv_stage2/seq_12' '-genderMapFile' '/frazer01/home/djakubosky/bin/gender_list.txt' '-ploidyMapFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapie ns_assembly19.ploidymap.txt' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/frazer 01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.lcmask.fasta' '-vcf' '/frazer01/home/djakubosky/GS_RD/cnv_stage1/seq_12/seq_12.sites.vcf.gz' ' -partitionName' 'P0042' '-partition' 'records:41001-42000' 798 ERROR 16:40:29,907 FunctionEdge - Contents of /frazer01/home/djakubosky/GS_RD/cnv_stage2/seq_12/logs/CNVDiscoveryStage2-42.out: 799 Error occurred during initialization of VM 800 Could not reserve enough space for object heap 801 Error: Could not create the Java Virtual Machine. 802 Error: A fatal exception has occurred. Program will exit. @

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-162120349.

djakubosky commented 8 years ago

Yeah I think I can modify this part of the command " java -Xmx4g -cp ${classpath} \" and change this to I dunno, 16 gig or something, but that was how they had it in their example, should I try this?

the -Xmx4g can be changed, to be clear

cdeboever3 commented 8 years ago

You can try making it bigger but I think the problem is that the job isn't requesting enough memory from the SGE management system. We need to specifically request how much memory we need for each job (per core) using h_vmem. I don't know if Queue is doing that. If not, it's getting the default which I don't know off hand (maybe you can get it through qconf? Train internet is too slow for me to figure it out.).

On Friday, December 4, 2015, djakubosky notifications@github.com wrote:

Yeah I think I can modify this part of the command " java -Xmx4g -cp ${classpath} \" and change this to I dunno, 16 gig or something, but that was how they had it in their example, should I try this?

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-162120878.

tatarsky commented 8 years ago

The default h_vmem is 4G. If you do not ask specifically for more you get 4G.

You need to find how your item is being submitted and ask for more or control what Java is trying to ask for.

Java if you recall from a few other Git issues tends require you to add some additional overhead. I would experiment with a run via qlogin as that will simulate roughly a concept of qsub'ing something that then submits a job....if you understand what I mean.

cdeboever3 commented 8 years ago

I've been giving java 1gb less than the total memory I request per job and I also use -XX:ParallelGCThreads=1. Typically this means I'm running things with one core, requesting 5gb with h_vmem, and giving java 4g.

On Fri, Dec 4, 2015 at 5:24 PM, tatarsky notifications@github.com wrote:

The default h_vmem is 4G. If you do not ask specifically for more you get 4G.

You need to find how your item is being submitted and ask for more or control what Java is trying to ask for.

Java if you recall from a few other Git issues tends require you to add some additional overhead. I would experiment with a run via qlogin as that will simulate roughly a concept of qsub'ing something that then submits a job....if you understand what I mean.

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-162123613.

tatarsky commented 8 years ago

Also if DRMAA is in the loop you must locate its python section which constructs its equivalent of a qsub. I suspect from past dealings with it there is a config section for the submit items it does. It does not usually just run "qsub". DRMAA is a python abstraction for batch queuing routines.

tatarsky commented 8 years ago

Your java plan above is reasonable.

I will now be offline until the morning but will check back in the morning.

tatarsky commented 8 years ago

The sge library location has been added to the system default ldconfig path. Libraries in the system default can be seen if curious with:

ldconfig -p
djakubosky commented 8 years ago

Playing with the arguments I can pass to -jobNative I have requested 5G of h_vmem this alleviates some of the errors but still isn't enough: see below

27 # 28 # There is insufficient memory for the Java Runtime Environment to continue. 29 # Native memory allocation (malloc) failed to allocate 286192 bytes for Chunk::new 30 # An error report file with more information is saved as: 31 # /tmp/jvm-8501/hs_error.log 32 Exception in thread "S3Put-Thread" java.lang.OutOfMemoryError: (class: org/jets3t/service/ServiceException) 33 at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:566) 34 at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:961) 35 at org.jets3t.service.impl.rest.httpclient.RestStorageService.createObjectImpl(RestStorageService.java:1695) 36 at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectWithRequestEntityImpl(RestStorageService.java:1630) 37 at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectImpl(RestStorageService.java:1617) 38 at org.jets3t.service.StorageService.putObject(StorageService.java:815) 39 at org.jets3t.service.S3Service.putObject(S3Service.java:2121) 40 at org.broadinstitute.gatk.engine.phonehome.GATKRunReport$S3PutRunnable.run(GATKRunReport.java:536) 41 at java.lang.Thread.run(Thread.java:745) 42 ##### ERROR ------------------------------------------------------------------------------------------ 43 ##### ERROR stack trace 44 java.lang.NullPointerException 45 at org.broadinstitute.sv.metadata.MetaData.computeSampleDoubleMap(MetaData.java:447) 46 at org.broadinstitute.sv.metadata.MetaData.getSampleFragmentsPerBaseMap(MetaData.java:421) 47 at org.broadinstitute.sv.genotyping.GenotypingDepthModule.init(GenotypingDepthModule.java:1798) 48 at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.initModules(GenotypingAlgorithm.java:522) 49 at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.initialize(GenotypingAlgorithm.java:87) 50 at org.broadinstitute.sv.genotyping.SVGenotyperWalker.initialize(SVGenotyperWalker.java:217) 51 at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83) 52 at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319) 53 at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) 54 at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:124) 55 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) 56 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) 57 at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:78) 58 at org.broadinstitute.sv.main.SVGenotyper.main(SVGenotyper.java:21) 59 ##### ERROR ------------------------------------------------------------------------------------------ 60 ##### ERROR A GATK RUNTIME ERROR has occurred (version ): 61 ##### ERROR 62 ##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem. 63 ##### ERROR If not, please post the error message, with stack trace, to the GATK forum. 64 ##### ERROR Visit our website and forum for extensive documentation and answers to 65 ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk 66 ##### ERROR 67 ##### ERROR MESSAGE: Code exception (see stack trace for error itself) 68 ##### ERROR ------------------------------------------------------------------------------------------ ~

djakubosky commented 8 years ago

Dumb question, but is there any way I can easily kill all my jobs- I want to kill this thing, but I'm suspicious that jobs spawn more jobs in this workflow- I want to kill them all and give more memory

djakubosky commented 8 years ago

Just as an update- it appears we have solved memory allocation issues:

Our current problems are those of this thread (which has no solution)

http://gatkforums.broadinstitute.org/discussion/6188/svgenotyper-error-null-pointer

tatarsky commented 8 years ago

So you can kill all your jobs with the -u agument to qdel. (man qdel). But be really sure that is what you want to do.

djakubosky commented 8 years ago

Ahh ok I figured such a thing existed- I wrote a python script to qdel the specific things that I wanted to kill so we are all set.

Thanks Paul

tatarsky commented 8 years ago

Yep. But be careful ;) It does exactly what it says: it kills all jobs by your username. (Note clearly you can't kill other people's jobs unless you are an SGE operator user)

djakubosky commented 8 years ago

Issues in CNVDiscovery Pipeline likely arise from a problem in the SVPreprocessing step- which can't find the "PloidyMap" file- Telling it explicitly where this is and re-testing (this is a file that they include in their "reference metadata").

Note for Paul- Leaving these updates so that we are all on same page when/if I run into problems I cannot solve and might want to reach out for advice

tatarsky commented 8 years ago

Oh I fully understand ;)

djakubosky commented 8 years ago

CNVDiscovery running smoothly- up to about CNVdiscovery stage 3-4- Latest issue involves a gender map file that needs to have a sample name from the @RG tag of the bam file- rather than a path to that sample. Will retest after @cdeboever3 finishes his analysis over the next week.

djakubosky commented 8 years ago

Hi Paul, I am running another Genome Strip test at the moment, if you get a chance I'd like you to look at the cluster and see if the way it is running seems reasonable. Another question I had was about how jobs will jump into the queue from other users while this pipeline has many (1K) small jobs in queue. Hurley submitted some jobs and then 3/6 of them made it in to run. I'm just curious as to the logic of this, and how to get some of their jobs in while my stuff is running.

tatarsky commented 8 years ago

If you are submitting to the default queue (all.q) we've had a few discussions of this topic (take a look at #24) and using the short.q to make sure people can still run jobs. I will look however if people feel that isn't what they want.

djakubosky commented 8 years ago

I submitted this way mostly because I didn't know how long jobs would take to run

On Thursday, December 17, 2015, tatarsky notifications@github.com wrote:

If you are submitting to the default queue (all.q) we've had a few discussions of this topic (take a look at #24 https://github.com/frazer-lab/cluster/issues/24) and using the short.q to make sure people can still run jobs. I will look however if people feel that isn't what they want.

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-165648878.


David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego


tatarsky commented 8 years ago

Which is fine. And if somebody with shorter jobs needs to jump in thats what we made the short.q for. There are some other options to de-prioritize jobs if needed. I'll review tomorrow if I see items stuck that should not be.

djakubosky commented 8 years ago

Looks like things are running fine at present, thanks for looking into this!

On Thu, Dec 17, 2015 at 8:00 PM, tatarsky notifications@github.com wrote:

Which is fine. And if somebody with shorter jobs needs to jump in thats what we made the short.q for. There are some other options to de-prioritize jobs if needed. I'll review tomorrow if I see items stuck that should not be.

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-165665273.


David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego


djakubosky commented 8 years ago

I have a new error that perhaps you could make some sense of where GS had trouble with SGE.

WARN 14:55:07,886 DrmaaJobRunner - Unable to determine status of job id 77121 org.ggf.drmaa.DrmCommunicationException: unable to send message to qmaster using port 6444 on host "fl-hn1": got send error at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:402) at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392) at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(JnaSession.java:156) at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.liftedTree1$1(DrmaaJobRunner.scala:105) at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:104) at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56) at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56) at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:56) at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1128) at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1120) at scala.collection.immutable.List.foreach(List.scala:318) at org.broadinstitute.gatk.queue.engine.QGraph.updateStatus(QGraph.scala:1120) at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:468) at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156) at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:171) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62) at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala) ERROR 14:55:08,014 FunctionEdge - Error: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/frazer01/home/djakubosky/GS_RD2/.queue/tmp' '-cp' '/frazer01/home/djakubosky/software/svtoolkit/lib/SVToolkit.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVGenotyper '-T' 'SVGenotyperWalker' '-R' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta' '-I' '/frazer01/home/djakubosky/GS_RD2/bam_headers/merged_headers.bam' '-O' '/frazer01/home/djakubosky/GS_RD2/cnv_stage2/seq_3/P0279.genotypes.vcf.gz' '-disableGATKTraversal' 'true' '-md' '/frazer01/home/djakubosky/Genome_Strip4' '-configFile' '/frazer01/home/djakubosky/software/svtoolkit/conf/genstrip_parameters.txt' '-P' 'genotyping.modules:depth' '-P' 'depth.readCountCacheIgnoreGenomeMask:true' '-runDirectory' '/frazer01/home/djakubosky/GS_RD2/cnv_stage2/seq_3' '-genderMapFile' '/frazer01/home/djakubosky/bin/gender_list.txt' '-ploidyMapFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.lcmask.fasta' '-vcf' '/frazer01/home/djakubosky/GS_RD2/cnv_stage1/seq_3/seq_3.sites.vcf.gz' '-partitionName' 'P0279' '-partition' 'records:278001-279000'
ERROR 14:57:09,416 FunctionEdge - Unable to access log file: /frazer01/home/djakubosky/GS_RD2/cnv_stage2/seq_3/logs/CNVDiscoveryStage2-279.out

djakubosky commented 8 years ago

For whatever reason when I run GS even though it inherits my environment with -V, I get an error where it can't find Rscript on the nodes (at least on this occasion). This is one instance where it caused a failure

I will add module load R to my bashrc to see if that helps

frazer01/home/djakubosky/GS_RD2/.queue/tmp/.exec768510066079578112: line 2: Rscript: command not found

tatarsky commented 8 years ago

Unlikely I will have time to look at this today.

djakubosky commented 8 years ago

Thanks for letting me know, will continue troubleshooting and advise on where I'm at for when you get a chance to look at it!

djakubosky commented 8 years ago

Adding module add R to the bashrc appears to allow pipeline to advance

djakubosky commented 8 years ago

Genome STRiP CNVDiscovery has completed successfully for the first time!!! - More testing to follow

djakubosky commented 8 years ago

Have been getting a sporadic error in RScript that has halted the discovery pipeline and has yet to be fixed @cdeboever3 @tatarsky, the error looks like this.

DEBUG 14:48:41,357 RScriptExecutor - Rscript DEBUG 14:48:41,358 RScriptExecutor - -e
DEBUG 14:48:41,358 RScriptExecutor - tempLibDir = '/tmp/Rlib.5783562322893173025';install.packages(pkgs=c('/tmp/RlibSources.499817440845309518/gsalib'), lib=tempLibDir, repos=NULL, type='source', INSTALL_opts=c('--no-libs', '--no-data', '--no-help', '--no-demo', '--no-exec'));library('gsalib', lib.loc=tempLibDir);source('/tmp/queueJobReport.9003837706511310021.R'); DEBUG 14:48:41,358 RScriptExecutor - /frazer01/home/djakubosky/BF_GS_Discovery/CNVDiscoveryPipeline.jobreport.txt DEBUG 14:48:41,358 RScriptExecutor - /frazer01/home/djakubosky/BF_GS_Discovery/CNVDiscoveryPipeline.jobreport.pdf ERROR: dependencies ‘gplots’, ‘png’ are not available for package ‘gsalib’

I don't understand why this has occurred (the pipeline worked twice for small batches)

djakubosky commented 8 years ago

has our R changed?

tatarsky commented 8 years ago

There are multiple versions of R available. The system one I maintain I can state is unchanged and the dates on the module ones don't look modified.

But lets make sure which R we are dealing with.

Can you determine in your script which Rscript is being called?

Your item looks like your pipeline is trying to install something. That may require proper permissions...

I am not working until Monday but the above will make solving your item easier at that time so answer when you can. I do not maintain the module based R collections but can assist in debugging once I know which one you are using.

djakubosky commented 8 years ago

I've been simply adding "module add R" to my bashrc, and in the past that worked for this pipeline, this is a new error.

Apologies for my short responses, I'm limited to mobile for a while and have limited connectivity

On Saturday, December 26, 2015, tatarsky notifications@github.com wrote:

There are multiple versions of R available. The system one I maintain I can state is unchanged and the dates on the module ones don't look modified.

But lets make sure which R we are dealing with.

Can you determine in your script which Rscript is being called?

Your item looks like your pipeline is trying to install something. That may require proper permissions...

I am not working until Monday but the above will make solving your item easier at that time so answer when you can. I do not maintain the module based R collections but can assist in debugging once I know which one you are using.

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-167371789.


David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego


tatarsky commented 8 years ago

Sounds like just a need for some new modules in that R module (gsalib, plots, png) which I believe have a few other dependencies as well.

On Monday I'll work with the maintainer of that R tree to add them. Sounds like this pipeline wants a few more items in this case and when it can't find them its trying to install them itself. Which seldom works unless its maintaining its own tree of R modules due to permissions.

djakubosky commented 8 years ago

Turns out this R version doesn't have a compatible gsalib version, pointing the genome strip to R/3.1.1 seems to have solved this problem for now. I'll advise as pipeline progresses

On Saturday, December 26, 2015, tatarsky notifications@github.com wrote:

Sounds like just a need for some new modules in that R module (gsalib, plots, png) which I believe have a few other dependencies as well.

On Monday I'll work with the maintainer of that R tree to add them. Sounds like this pipeline wants a few more items in this case and when it can't find them its trying to install them itself. Which seldom works unless its maintaining its own tree of R modules due to permissions.

— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-167373959.


David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego


djakubosky commented 8 years ago

This is an error I sometimes get, it is unclear why, but usually restarting the pipeline will fix it

WARN 18:12:43,778 DrmaaJobRunner - Unable to determine status of job id 128098 1322 org.ggf.drmaa.DrmCommunicationException: failed receiving gdi request response for mid=63604 (can't send response for this message id - protocol error). 1323 at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:402) 1324 at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392) 1325 at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(JnaSession.java:156) 1326 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.liftedTree1$1(DrmaaJobRunner.scala:105) 1327 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:104) 1328 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56) 1329 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56) 1330 at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153) 1331 at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) 1332 at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306) 1333 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:56) 1334 at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1128)
1335 at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1120)
1336 at scala.collection.immutable.List.foreach(List.scala:318)
1337 at org.broadinstitute.gatk.queue.engine.QGraph.updateStatus(QGraph.scala:1120)
1338 at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:468)
1339 at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156)
1340 at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:171)
1341 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
1342 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
1343 at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62)