Closed djakubosky closed 8 years ago
Note my discovery command looks like this 1 SV_DIR="/frazer01/home/djakubosky/software/svtoolkit" 2 classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar" 3 echo $classpath 4 java -Xmx4g -cp ${classpath} \ 5 org.broadinstitute.gatk.queue.QCommandLine \ 6 -S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \ 7 -S ${SV_DIR}/qscript/SVQScript.q \ 8 -cp ${classpath} \ 9 -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ 10 -configFile ${SV_DIR}/conf/genstrip_parameters.txt \ 11 -I /frazer01/home/djakubosky/Test_Data/full_bams.list \ 12 -R /frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta \ 13 -genderMapFile /frazer01/home/djakubosky/bin/gender_list.txt \ 14 -md /frazer01/home/djakubosky/Genome_Strip2 \ 15 -runDirectory /frazer01/home/djakubosky/GS_RD \ 16 -jobLogDir /frazer01/home/djakubosky/GS_RD/logs \ 17 -jobRunner Drmaa \ 18 -jobQueue short.q \ 19 -produceAuxiliaryFiles \ 20 -tilingWindowSize 1000 \ 21 -tilingWindowOverlap 500 \ 22 -maximumReferenceGapLength 1000 \ 23 -boundaryPrecision 100 \ 24 -minimumRefinedLength 500 \ 25 -run ~
Sounds like its a DRMAA based tool. There is a SGE specific DRMAA library in:
/opt/sge/lib/lx-amd64
Might be as simple as adding to your LD_LIBRARY_PATH. But I'd have to look at it and unlikely today.
/opt/sge/lib/lx-amd64
-rwxr-xr-x 1 root root 1872083 Oct 28 20:30 libdrmaa.so.1.0
lrwxrwxrwx 1 root root 15 Oct 28 20:30 libdrmaa.so -> libdrmaa.so.1.0
I have found that advice elsewhere. I'll trying adding that to the LD_Library_Path and let you know what happens- If I'm still failing I will advise.
Thanks!! David
We can add it globally as well but check first if it helps. Sometimes tools want other DRMAA libraries but thats the one that comes with SGE.
Appears to be working and submitting jobs to the cluster now with that in the LD_LIBRARY_PATH
Ah cool! I will add it to the system ld.conf.so search path then so you do not have to remember to do that. I will do that a bit later today as I'm heading to get my wife from work.
Putting this here to document things I'm trying and to keep everyone on the same page
export LD_LIBRARY_PATH="/opt/sge/lib/lx-amd64/" enables the submission of jobs through Queue system to SGE however we get a samtools error, because the environment on the compute node doesn't inherit my own through Queue
this line must be added to the command -jobNative "-V -cwd" http://gatkforums.broadinstitute.org/discussion/6354/unable-to-load-library-drmaa-libdrmaa-so-error
Examining output from this command I see the script occasionally still trying to call lsf706 job runner and I don't know why
/frazer01/home/djakubosky/bin/Error_Disc2
see this file for output
We also need to remember to specify something called GATKjobrunner like so
-gatkJobRunner Drmaa
I won't get to this tonight due to my rule that I don't change things Friday but I will let you know when you will no longer need the LD_LIBRARY_PATH (and the -V passing of it) to the environment because I will add it to the default ldconfig cache path.
I will look at other output once I do the above but will not be tonight.
New error "unable to submit job", not totally sure why (submits jobs earlier in pipeline)
ERROR 15:53:58,515 Retry - Caught error during attempt 1 of 4. 542 org.broadinstitute.gatk.queue.QException: Unable to submit job: denied: host "fl-n-1-5" is not submit host 543 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply$mcV$sp(DrmaaJobRunner.scala:89) 544 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:85) 545 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner$$anonfun$start$1.apply(DrmaaJobRunner.scala:85) 546 at org.broadinstitute.gatk.queue.util.Retry$.attempt(Retry.scala:49) 547 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.start(DrmaaJobRunner.scala:85) 548 at org.broadinstitute.gatk.queue.engine.FunctionEdge.start(FunctionEdge.scala:84) 549 at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:434) 550 at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156) 551 at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:171) 552 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) 553 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) 554 at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62) 555 at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
Why are you submitting jobs from a node:
Unable to submit job: denied: host "fl-n-1-5" is not submit host
Do you need this? I dimly recall asking if people need to submit from nodes but do not recall the answer. Its not a big deal to add but I usually check if its a typo or really needed.
I'm not, jobs are submitted by Queue workflow manager- I start the pipeline from hn1, and somewhere down the line it appears to try to submit from a node, maybe when its recursively running jobs
OK. I just added to all nodes the ability to submit jobs. Thats a safe one for a Friday night as its just a status bit ;) Try again.
Thanks! Fingers crossed haha
Well, it got further than before and spawned many jobs across the cluster
it dies complaining that is can't create the java virtual machine
RROR 16:40:29,900 FunctionEdge - Error: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tm pdir=/frazer01/home/djakubosky/bin/.queue/tmp' '-cp' '/frazer01/home/djakubosky/software/svtoolkit/lib/SVToolkit.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Geno meAnalysisTK.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVGenotyper '-T' 'SVGenotyperWalker' '-R' '/frazer01/home/djakubos ky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta' '-I' '/frazer01/home/djakubosky/GS_RD/bam_headers/merged_headers.bam' '-O' '/frazer01/home/djakubosky/G S_RD/cnv_stage2/seq_12/P0042.genotypes.vcf.gz' '-disableGATKTraversal' 'true' '-md' '/frazer01/home/djakubosky/Genome_Strip2' '-configFile' '/frazer01/home/djakubosky/softwa re/svtoolkit/conf/genstrip_parameters.txt' '-P' 'genotyping.modules:depth' '-P' 'depth.readCountCacheIgnoreGenomeMask:true' '-runDirectory' '/frazer01/home/djakubosky/GS_RD/c nv_stage2/seq_12' '-genderMapFile' '/frazer01/home/djakubosky/bin/gender_list.txt' '-ploidyMapFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapie ns_assembly19.ploidymap.txt' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/frazer 01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.lcmask.fasta' '-vcf' '/frazer01/home/djakubosky/GS_RD/cnv_stage1/seq_12/seq_12.sites.vcf.gz' ' -partitionName' 'P0042' '-partition' 'records:41001-42000' 798 ERROR 16:40:29,907 FunctionEdge - Contents of /frazer01/home/djakubosky/GS_RD/cnv_stage2/seq_12/logs/CNVDiscoveryStage2-42.out: 799 Error occurred during initialization of VM 800 Could not reserve enough space for object heap 801 Error: Could not create the Java Virtual Machine. 802 Error: A fatal exception has occurred. Program will exit. @
It seems like it's not asking for enough memory from SGE. Is there any control over that?
On Fri, Dec 4, 2015 at 4:57 PM, djakubosky notifications@github.com wrote:
Well, it got further than before and spawned many jobs across the cluster
it dies complaining that is can't create the java virtual machine
RROR 16:40:29,900 FunctionEdge - Error: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tm pdir=/frazer01/home/djakubosky/bin/.queue/tmp' '-cp' '/frazer01/home/djakubosky/software/svtoolkit/lib/SVToolkit.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Geno meAnalysisTK.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVGenotyper '-T' 'SVGenotyperWalker' '-R' '/frazer01/home/djakubos ky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta' '-I' '/frazer01/home/djakubosky/GS_RD/bam_headers/merged_headers.bam' '-O' '/frazer01/home/djakubosky/G S_RD/cnv_stage2/seq_12/P0042.genotypes.vcf.gz' '-disableGATKTraversal' 'true' '-md' '/frazer01/home/djakubosky/Genome_Strip2' '-configFile' '/frazer01/home/djakubosky/softwa re/svtoolkit/conf/genstrip_parameters.txt' '-P' 'genotyping.modules:depth' '-P' 'depth.rea dCountCacheIgnoreGenomeMask:true' '-runDirectory' '/frazer01/home/djakubosky/GS_RD/c nv_stage2/seq_12' '-genderMapFile' '/frazer01/home/djakubosky/bin/gender_list.txt' '-ploidyMapFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapie ns_assembly19.ploidymap.txt' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/frazer 01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.lcmask.fasta' '-vcf' '/frazer01/home/djakubosky/GS_RD/cnv_stage1/seq_12/seq_12.sites.vcf.gz' ' -partitionName' 'P0042' '-partition' 'records:41001-42000' 798 ERROR 16:40:29,907 FunctionEdge - Contents of /frazer01/home/djakubosky/GS_RD/cnv_stage2/seq_12/logs/CNVDiscoveryStage2-42.out: 799 Error occurred during initialization of VM 800 Could not reserve enough space for object heap 801 Error: Could not create the Java Virtual Machine. 802 Error: A fatal exception has occurred. Program will exit. @
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-162120349.
Yeah I think I can modify this part of the command " java -Xmx4g -cp ${classpath} \" and change this to I dunno, 16 gig or something, but that was how they had it in their example, should I try this?
the -Xmx4g can be changed, to be clear
You can try making it bigger but I think the problem is that the job isn't requesting enough memory from the SGE management system. We need to specifically request how much memory we need for each job (per core) using h_vmem. I don't know if Queue is doing that. If not, it's getting the default which I don't know off hand (maybe you can get it through qconf? Train internet is too slow for me to figure it out.).
On Friday, December 4, 2015, djakubosky notifications@github.com wrote:
Yeah I think I can modify this part of the command " java -Xmx4g -cp ${classpath} \" and change this to I dunno, 16 gig or something, but that was how they had it in their example, should I try this?
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-162120878.
The default h_vmem is 4G. If you do not ask specifically for more you get 4G.
You need to find how your item is being submitted and ask for more or control what Java is trying to ask for.
Java if you recall from a few other Git issues tends require you to add some additional overhead. I would experiment with a run via qlogin as that will simulate roughly a concept of qsub'ing something that then submits a job....if you understand what I mean.
I've been giving java 1gb less than the total memory I request per job and I also use -XX:ParallelGCThreads=1. Typically this means I'm running things with one core, requesting 5gb with h_vmem, and giving java 4g.
On Fri, Dec 4, 2015 at 5:24 PM, tatarsky notifications@github.com wrote:
The default h_vmem is 4G. If you do not ask specifically for more you get 4G.
You need to find how your item is being submitted and ask for more or control what Java is trying to ask for.
Java if you recall from a few other Git issues tends require you to add some additional overhead. I would experiment with a run via qlogin as that will simulate roughly a concept of qsub'ing something that then submits a job....if you understand what I mean.
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-162123613.
Also if DRMAA is in the loop you must locate its python section which constructs its equivalent of a qsub. I suspect from past dealings with it there is a config section for the submit items it does. It does not usually just run "qsub". DRMAA is a python abstraction for batch queuing routines.
Your java plan above is reasonable.
I will now be offline until the morning but will check back in the morning.
The sge library location has been added to the system default ldconfig path. Libraries in the system default can be seen if curious with:
ldconfig -p
Playing with the arguments I can pass to -jobNative I have requested 5G of h_vmem this alleviates some of the errors but still isn't enough: see below
27 #
28 # There is insufficient memory for the Java Runtime Environment to continue.
29 # Native memory allocation (malloc) failed to allocate 286192 bytes for Chunk::new
30 # An error report file with more information is saved as:
31 # /tmp/jvm-8501/hs_error.log
32 Exception in thread "S3Put-Thread" java.lang.OutOfMemoryError: (class: org/jets3t/service/ServiceException)
33 at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRequest(RestStorageService.java:566)
34 at org.jets3t.service.impl.rest.httpclient.RestStorageService.performRestPut(RestStorageService.java:961)
35 at org.jets3t.service.impl.rest.httpclient.RestStorageService.createObjectImpl(RestStorageService.java:1695)
36 at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectWithRequestEntityImpl(RestStorageService.java:1630)
37 at org.jets3t.service.impl.rest.httpclient.RestStorageService.putObjectImpl(RestStorageService.java:1617)
38 at org.jets3t.service.StorageService.putObject(StorageService.java:815)
39 at org.jets3t.service.S3Service.putObject(S3Service.java:2121)
40 at org.broadinstitute.gatk.engine.phonehome.GATKRunReport$S3PutRunnable.run(GATKRunReport.java:536)
41 at java.lang.Thread.run(Thread.java:745)
42 ##### ERROR ------------------------------------------------------------------------------------------
43 ##### ERROR stack trace
44 java.lang.NullPointerException
45 at org.broadinstitute.sv.metadata.MetaData.computeSampleDoubleMap(MetaData.java:447)
46 at org.broadinstitute.sv.metadata.MetaData.getSampleFragmentsPerBaseMap(MetaData.java:421)
47 at org.broadinstitute.sv.genotyping.GenotypingDepthModule.init(GenotypingDepthModule.java:1798)
48 at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.initModules(GenotypingAlgorithm.java:522)
49 at org.broadinstitute.sv.genotyping.GenotypingAlgorithm.initialize(GenotypingAlgorithm.java:87)
50 at org.broadinstitute.sv.genotyping.SVGenotyperWalker.initialize(SVGenotyperWalker.java:217)
51 at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
52 at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:319)
53 at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
54 at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:124)
55 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
56 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
57 at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:78)
58 at org.broadinstitute.sv.main.SVGenotyper.main(SVGenotyper.java:21)
59 ##### ERROR ------------------------------------------------------------------------------------------
60 ##### ERROR A GATK RUNTIME ERROR has occurred (version
Dumb question, but is there any way I can easily kill all my jobs- I want to kill this thing, but I'm suspicious that jobs spawn more jobs in this workflow- I want to kill them all and give more memory
Just as an update- it appears we have solved memory allocation issues:
Our current problems are those of this thread (which has no solution)
http://gatkforums.broadinstitute.org/discussion/6188/svgenotyper-error-null-pointer
So you can kill all your jobs with the -u
agument to qdel. (man qdel).
But be really sure that is what you want to do.
Ahh ok I figured such a thing existed- I wrote a python script to qdel the specific things that I wanted to kill so we are all set.
Thanks Paul
Yep. But be careful ;) It does exactly what it says: it kills all jobs by your username. (Note clearly you can't kill other people's jobs unless you are an SGE operator user)
Issues in CNVDiscovery Pipeline likely arise from a problem in the SVPreprocessing step- which can't find the "PloidyMap" file- Telling it explicitly where this is and re-testing (this is a file that they include in their "reference metadata").
Note for Paul- Leaving these updates so that we are all on same page when/if I run into problems I cannot solve and might want to reach out for advice
Oh I fully understand ;)
CNVDiscovery running smoothly- up to about CNVdiscovery stage 3-4- Latest issue involves a gender map file that needs to have a sample name from the @RG tag of the bam file- rather than a path to that sample. Will retest after @cdeboever3 finishes his analysis over the next week.
Hi Paul, I am running another Genome Strip test at the moment, if you get a chance I'd like you to look at the cluster and see if the way it is running seems reasonable. Another question I had was about how jobs will jump into the queue from other users while this pipeline has many (1K) small jobs in queue. Hurley submitted some jobs and then 3/6 of them made it in to run. I'm just curious as to the logic of this, and how to get some of their jobs in while my stuff is running.
If you are submitting to the default queue (all.q
) we've had a few discussions of this topic (take a look at #24) and using the short.q
to make sure people can still run jobs. I will look however if people feel that isn't what they want.
I submitted this way mostly because I didn't know how long jobs would take to run
On Thursday, December 17, 2015, tatarsky notifications@github.com wrote:
If you are submitting to the default queue (all.q) we've had a few discussions of this topic (take a look at #24 https://github.com/frazer-lab/cluster/issues/24) and using the short.q to make sure people can still run jobs. I will look however if people feel that isn't what they want.
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-165648878.
David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego
Which is fine. And if somebody with shorter jobs needs to jump in thats what we made the short.q
for. There are some other options to de-prioritize jobs if needed. I'll review tomorrow if I see items stuck that should not be.
Looks like things are running fine at present, thanks for looking into this!
On Thu, Dec 17, 2015 at 8:00 PM, tatarsky notifications@github.com wrote:
Which is fine. And if somebody with shorter jobs needs to jump in thats what we made the short.q for. There are some other options to de-prioritize jobs if needed. I'll review tomorrow if I see items stuck that should not be.
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-165665273.
David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego
I have a new error that perhaps you could make some sense of where GS had trouble with SGE.
WARN 14:55:07,886 DrmaaJobRunner - Unable to determine status of job id 77121
org.ggf.drmaa.DrmCommunicationException: unable to send message to qmaster using port 6444 on host "fl-hn1": got send error
at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:402)
at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392)
at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(JnaSession.java:156)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.liftedTree1$1(DrmaaJobRunner.scala:105)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:104)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:56)
at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1128)
at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1120)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.broadinstitute.gatk.queue.engine.QGraph.updateStatus(QGraph.scala:1120)
at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:468)
at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156)
at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:171)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62)
at org.broadinstitute.gatk.queue.QCommandLine.main(QCommandLine.scala)
ERROR 14:55:08,014 FunctionEdge - Error: 'java' '-Xmx4096m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' '-XX:GCTimeLimit=50' '-XX:GCHeapFreeLimit=10' '-Djava.io.tmpdir=/frazer01/home/djakubosky/GS_RD2/.queue/tmp' '-cp' '/frazer01/home/djakubosky/software/svtoolkit/lib/SVToolkit.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/GenomeAnalysisTK.jar:/frazer01/home/djakubosky/software/svtoolkit/lib/gatk/Queue.jar' org.broadinstitute.sv.main.SVGenotyper '-T' 'SVGenotyperWalker' '-R' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.fasta' '-I' '/frazer01/home/djakubosky/GS_RD2/bam_headers/merged_headers.bam' '-O' '/frazer01/home/djakubosky/GS_RD2/cnv_stage2/seq_3/P0279.genotypes.vcf.gz' '-disableGATKTraversal' 'true' '-md' '/frazer01/home/djakubosky/Genome_Strip4' '-configFile' '/frazer01/home/djakubosky/software/svtoolkit/conf/genstrip_parameters.txt' '-P' 'genotyping.modules:depth' '-P' 'depth.readCountCacheIgnoreGenomeMask:true' '-runDirectory' '/frazer01/home/djakubosky/GS_RD2/cnv_stage2/seq_3' '-genderMapFile' '/frazer01/home/djakubosky/bin/gender_list.txt' '-ploidyMapFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.ploidymap.txt' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.svmask.fasta' '-genomeMaskFile' '/frazer01/home/djakubosky/reference_MD/Homo_sapiens_assembly19/Homo_sapiens_assembly19.lcmask.fasta' '-vcf' '/frazer01/home/djakubosky/GS_RD2/cnv_stage1/seq_3/seq_3.sites.vcf.gz' '-partitionName' 'P0279' '-partition' 'records:278001-279000'
ERROR 14:57:09,416 FunctionEdge - Unable to access log file: /frazer01/home/djakubosky/GS_RD2/cnv_stage2/seq_3/logs/CNVDiscoveryStage2-279.out
For whatever reason when I run GS even though it inherits my environment with -V, I get an error where it can't find Rscript on the nodes (at least on this occasion). This is one instance where it caused a failure
I will add module load R to my bashrc to see if that helps
frazer01/home/djakubosky/GS_RD2/.queue/tmp/.exec768510066079578112: line 2: Rscript: command not found
Unlikely I will have time to look at this today.
Thanks for letting me know, will continue troubleshooting and advise on where I'm at for when you get a chance to look at it!
Adding module add R to the bashrc appears to allow pipeline to advance
Genome STRiP CNVDiscovery has completed successfully for the first time!!! - More testing to follow
Have been getting a sporadic error in RScript that has halted the discovery pipeline and has yet to be fixed @cdeboever3 @tatarsky, the error looks like this.
DEBUG 14:48:41,357 RScriptExecutor - Rscript
DEBUG 14:48:41,358 RScriptExecutor - -e
DEBUG 14:48:41,358 RScriptExecutor - tempLibDir = '/tmp/Rlib.5783562322893173025';install.packages(pkgs=c('/tmp/RlibSources.499817440845309518/gsalib'), lib=tempLibDir, repos=NULL, type='source', INSTALL_opts=c('--no-libs', '--no-data', '--no-help', '--no-demo', '--no-exec'));library('gsalib', lib.loc=tempLibDir);source('/tmp/queueJobReport.9003837706511310021.R');
DEBUG 14:48:41,358 RScriptExecutor - /frazer01/home/djakubosky/BF_GS_Discovery/CNVDiscoveryPipeline.jobreport.txt
DEBUG 14:48:41,358 RScriptExecutor - /frazer01/home/djakubosky/BF_GS_Discovery/CNVDiscoveryPipeline.jobreport.pdf
ERROR: dependencies ‘gplots’, ‘png’ are not available for package ‘gsalib’
I don't understand why this has occurred (the pipeline worked twice for small batches)
has our R changed?
There are multiple versions of R available. The system one I maintain I can state is unchanged and the dates on the module ones don't look modified.
But lets make sure which R we are dealing with.
Can you determine in your script which Rscript is being called?
Your item looks like your pipeline is trying to install something. That may require proper permissions...
I am not working until Monday but the above will make solving your item easier at that time so answer when you can. I do not maintain the module based R collections but can assist in debugging once I know which one you are using.
I've been simply adding "module add R" to my bashrc, and in the past that worked for this pipeline, this is a new error.
Apologies for my short responses, I'm limited to mobile for a while and have limited connectivity
On Saturday, December 26, 2015, tatarsky notifications@github.com wrote:
There are multiple versions of R available. The system one I maintain I can state is unchanged and the dates on the module ones don't look modified.
But lets make sure which R we are dealing with.
Can you determine in your script which Rscript is being called?
Your item looks like your pipeline is trying to install something. That may require proper permissions...
I am not working until Monday but the above will make solving your item easier at that time so answer when you can. I do not maintain the module based R collections but can assist in debugging once I know which one you are using.
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-167371789.
David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego
Sounds like just a need for some new modules in that R module (gsalib, plots, png) which I believe have a few other dependencies as well.
On Monday I'll work with the maintainer of that R tree to add them. Sounds like this pipeline wants a few more items in this case and when it can't find them its trying to install them itself. Which seldom works unless its maintaining its own tree of R modules due to permissions.
Turns out this R version doesn't have a compatible gsalib version, pointing the genome strip to R/3.1.1 seems to have solved this problem for now. I'll advise as pipeline progresses
On Saturday, December 26, 2015, tatarsky notifications@github.com wrote:
Sounds like just a need for some new modules in that R module (gsalib, plots, png) which I believe have a few other dependencies as well.
On Monday I'll work with the maintainer of that R tree to add them. Sounds like this pipeline wants a few more items in this case and when it can't find them its trying to install them itself. Which seldom works unless its maintaining its own tree of R modules due to permissions.
— Reply to this email directly or view it on GitHub https://github.com/frazer-lab/cluster/issues/72#issuecomment-167373959.
David Jakubosky Biomedical Sciences Graduate Program Laboratory of Kelly A. Frazer, PhD Institute for Genomic Medicine, University of California at San Diego
This is an error I sometimes get, it is unclear why, but usually restarting the pipeline will fix it
WARN 18:12:43,778 DrmaaJobRunner - Unable to determine status of job id 128098
1322 org.ggf.drmaa.DrmCommunicationException: failed receiving gdi request response for mid=63604 (can't send response for this message id - protocol error).
1323 at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:402)
1324 at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.checkError(JnaSession.java:392)
1325 at org.broadinstitute.gatk.utils.jna.drmaa.v1_0.JnaSession.getJobProgramStatus(JnaSession.java:156)
1326 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.liftedTree1$1(DrmaaJobRunner.scala:105)
1327 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobRunner.updateJobStatus(DrmaaJobRunner.scala:104)
1328 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
1329 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager$$anonfun$updateStatus$1.apply(DrmaaJobManager.scala:56)
1330 at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:153)
1331 at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
1332 at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:306)
1333 at org.broadinstitute.gatk.queue.engine.drmaa.DrmaaJobManager.updateStatus(DrmaaJobManager.scala:56)
1334 at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1128)
1335 at org.broadinstitute.gatk.queue.engine.QGraph$$anonfun$updateStatus$1.apply(QGraph.scala:1120)
1336 at scala.collection.immutable.List.foreach(List.scala:318)
1337 at org.broadinstitute.gatk.queue.engine.QGraph.updateStatus(QGraph.scala:1120)
1338 at org.broadinstitute.gatk.queue.engine.QGraph.runJobs(QGraph.scala:468)
1339 at org.broadinstitute.gatk.queue.engine.QGraph.run(QGraph.scala:156)
1340 at org.broadinstitute.gatk.queue.QCommandLine.execute(QCommandLine.scala:171)
1341 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
1342 at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
1343 at org.broadinstitute.gatk.queue.QCommandLine$.main(QCommandLine.scala:62)
I'm attempting to get a tool called GenomeSTRiP (2.01) to work with SGE (it is designed to do so)
See here http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_CNVDiscoveryPipeline.html
and for the QUEUE command line see here http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_QCommandLine.html
I get an error when I run the CNV discovery pipeline that looks like this java.lang.UnsatisfiedLinkError: Unable to load library 'drmaa': libdrmaa.so: cannot open shared object file: No such file or directory
Was wondering if you had any thoughts on why this might be, how to point this tool to our implementation of SGE
Thanks! David