bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
985 stars 353 forks source link

error in bcbio structural variant calling #653

Closed shang-qian closed 9 years ago

shang-qian commented 9 years ago

Hi Brad,

Thanks for your help. I want to call structural variants, but get an error: the parallel, svtyper, cnvnator_wrapper.py, cnvnator-multi, annotate_rd.py are not found in PATH, like this:

[2014-10-27 23:05] Uncaught exception occurred Traceback (most recent call last): File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 20, in run _do_run(cmd, checks, log_stdout) File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 93, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; speedseq sv -v -B ...... Sourcing executables from /public/software/bcbio-nextgen/tools/bin/speedseq.config ... which: no parallel in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio-nextgen/anaconda/bin:.....) which: no svtyper in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio.... which: no cnvnator_wrapper.py in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio.... which: no cnvnator-multi in (/public/software/bcbio-nextgen/tools/bin:/public/software/bcbio-.... which: no annotate_rd.py in((/public/software/bcbio-nextgen/tools/bin:/....) Calculating alignment stats... sambamba-view: (Broken pipe) Traceback (most recent call last): File "/public/software/bcbio-nextgen/tools/share/lumpy-sv/pairend_distro.py", line 12, in import numpy as np ImportError: No module named numpy

How can I fix this, thanks again.

Shangqian

chapmanb commented 9 years ago

Shangqian; Thanks for the report and apologies about the issue. The problem was that speedseq, which wraps the lumpy calling, calls out to a lumpy python script that requires numpy. If your system python does not have numpy installed, it results in this error. The other messages about svtyper and cnvnator are not a problem as we don't use those within bcbio.

I pushed a fix which resolves the issue by ensuring we use the Python installed with bcbio, which does contain numpy. If you upgrade with:

bcbio_nextgen.py upgrade -u development

it will grab the latest code and should work cleanly now. Thanks again.

shang-qian commented 9 years ago

Hi Brad, Thanks so much, and It works well in genome_sv. There is another little question that I am not sure: In my data analysis, I had called the vcfs from a family used one caller (gatk-hc). Now I want to use the three callers and ensemble the result vcf. The following is my yaml file. Is that right? Thanks. details:

chapmanb commented 9 years ago

Shangqian; That generally looks good, although you only have 2 variant callers listed. You'll want to have 3 or more to get good results from ensemble calling: samtools and platypus are two other good choices. Glad the fix worked for you.

shang-qian commented 9 years ago

sorry for my typing mistake in three caller :). Thanks again for your helpful suggestion and contribution. The bcbio is great and useful for me.

shang-qian commented 9 years ago

Hi Brad,

when I run above code, there exists following error: Exception in thread "main" java.lang.Exception: VCF files do not have consistent headers: ["ceph-gatk-haplotype.vcf.gz" "ceph-samtools.vcf.gz"]

I know the problem is in VCF file, so I open the two VCF files and find the header sample names are different: the order in gatk-hc is sample10/sample8/sample9, but sample8/sample10/sample9 in samtols. finally, this problem is solved since I correct the same order. However, I don't think this is a good way to manually modify every time.

So, is there an automatic way for the same header by just modifying the input yaml file or bcbio-nextgen. Thanks.

kind regards, Shangqian

chapmanb commented 9 years ago

Shangqian; Sorry about the issue. bcbio.variation did not explicitly sort input VCFs which can cause issues with different callers that insist on sorting in specific ways. I pushed a fix which should handle resorting these to a consistent order prior to doing ensemble calling. If you upgrade your tools with:

bcbio_nextgen.py upgrade --tools

and re-run it should hopefully work cleanly now. Thanks again for the reports.

shang-qian commented 9 years ago

Hi Brad,

Thanks for your reponse. I had updated the bcbio-nextgen. Thanks a lot. Besides, my log file out from the cancer yaml showed the memory did not enough for gatk. But this issue didn't exist in the exome pipeline. So can you help me to fix this. The following is the error log content:

[2014-11-10 17:06] ##### ERROR MESSAGE: There was a failure because you did not provide enough memory to run this program. See the -Xmx JVM argument to adjust the maximum heap size provided to Java [2014-11-10 17:06] ##### ERROR ------------------------------------------------------------------------------------------ [2014-11-10 17:06] Uncaught exception occurred Traceback (most recent call last): File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /public/software/bcbio-nextgen/tools/bin/gatk-framework -Xms166m -Xmx1166m -XX:+UseSerialGC -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment -T PrintReads -L 9:96714156-127734373 -R /public/software/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -I /public/users/xieshangqian/project/LungC/bcbio/work/align/syn3-tumor/2_2014-11-03_dream-syn3-sort.bam --downsample_to_coverage 10000 -BQSR /public/users/xieshangqian/project/LungC/bcbio/work/align/syn3-tumor/2_2014-11-03_dream-syn3-sort.grp -o /public/users/xieshangqian/project/LungC/bcbio/work/bamprep/syn3-tumor/9/tx/tmpRv7YoC/2_2014-11-03_dream-syn3-sort-9_96714155_127734373-prep-prealign.bam

Kind regards, Shangqian

chapmanb commented 9 years ago

Shangqian; It looks like you need to allocate additional memory to GATK in your /public/software/bcbio-nextgen/galaxy/bcbio_system.yaml file, specifically increasing the -Xmx value under gatk. The cancer dataset is high depth (100x) and it looks like GATK needs additional memory to run effectively. Hope this helps.

shang-qian commented 9 years ago

Thank you, Brad,.The cancer pipeline was done well. Many thanks for your big help every time.

By the way, does the bcbio require the same length of paired-end Read1 and Read2 for bwa-men alignment? Because the different length Read1 and Read 2 that were trimed by trimmomatic showed the error: "paired reads have different names". In my view the bwa-men may be normal for different length read alignment. So, is there some special setting or some parameters than I didn't mention in bcbio. Thanks again. :)

chapmanb commented 9 years ago

Shangqian; Glad that the cancer calling finished without any problems. bcbio/bwa-mem do not require reads to be the same length, but do require that all reads are paired. How did you run trimmomatic? The best approach is to use the paired end (PE) mode and feed the paired output into bcbio:

http://www.usadellab.org/cms/index.php?page=trimmomatic

It sounds like you may have trimmed separately or added the unpaired reads in which creates non-identical pair names in your fastq files. Hope this helps.

shang-qian commented 9 years ago

Hi Brad, Thanks so much for your reponse. I had used the NA12891 data for testing , and the bcbio/bwa-men is ok. So that I am uncertain where is the problem happened now. My test step was that: I input the NA12891.R1 and R2 fastq file ,the error also existed, the following is the error messages:

[2014-11-20 18:16] [mem_sam_pe] paired reads have different names: "FFECCFHG>DEGCGGABGBCGEIDCFGGH:DF######", "AF@?EEFDB>B3<>FCD?BFBCGGGGFEHGEHE7GHHHHDEIHGE=>FECDE=AECCH/7?>DC@IH8DCFCFDC" [2014-11-20 18:16] samblaster: Loaded 84 header sequence entries. [2014-11-20 18:16] Uncaught exception occurred Traceback (most recent call last): File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /public/software/bcbio-nextgen/tools/bin/bwa mem -M -t 16 -R '@RG\tID:1\tPL:illumina\tPU:s1\tSM:s1' -v 1 /public/software/bcbio-nextgen/genomes/Hsapiens/GRCh37/bwa/GRCh37.fa <(/public/software/bcbio-nextgen/tools/bin/grabix grab /public/users/xieshangqian/project/NAR/bcbio/work/align_prep/NA12891.R1.fastq.gz 20000000 39999999) <(/public/software/bcbio-nextgen/tools/bin/grabix grab /public/users/xieshangqian/project/NAR/bcbio/work/align_prep/NA12891.R2.fastq.gz 20000000 39999999) | /public/software/bcbio-nextgen/tools/bin/samblaster --splitterFile >(/public/software/bcbio-nextgen/tools/bin/samtools view -S -u /dev/stdin | /public/software/bcbio-nextgen/tools/bin/sambamba sort -t 16 -m 682M --tmpdir /public/users/xieshangqian/project/NAR/bcbio/work/tx/tmpnH0wTs/spl -o /public/users/xieshangqian/project/NAR/bcbio/work/align/s1/split/tx/tmp4ArZBx/s1-sort-20000000_39999999-sr.bam /dev/stdin) --discordantFile >(/public/software/bcbio-nextgen/tools/bin/samtools view -S -u /dev/stdin | /public/software/bcbio-nextgen/tools/bin/sambamba sort -t 16 -m 682M --tmpdir /public/users/xieshangqian/project/NAR/bcbio/work/tx/tmpnH0wTs/disc -o /public/users/xieshangqian/project/NAR/bcbio/work/align/s1/split/tx/tmpjoA75N/s1-sort-20000000_39999999-disc.bam /dev/stdin) | /public/software/bcbio-nextgen/tools/bin/samtools view -S -u /dev/stdin | /public/software/bcbio-nextgen/tools/bin/sambamba sort -t 16 -m 682M --tmpdir /public/users/xieshangqian/project/NAR/bcbio/work/tx/tmpnH0wTs/full -o /public/users/xieshangqian/project/NAR/bcbio/work/align/s1/split/tx/tmpZuouw8/s1-sort-20000000_39999999.bam /dev/stdin

I think the error is because of the paired reads have different names, so I grep the "FFECCFHG>DEGCGGABGBCGEIDCFGGH:DF######" and "AF@?EEFDB>B3<>FCD?BFBCGGGGFEHGEHE7GHHHHDEIHGE=>FECDE=AECCH/7?>DC@IH8DCFCFDC" that are both in line 20000000 from NA12891 R1 and R2 fastq file. the result are : R1 read: line 19999997-20000000 @206B4ABXX100825:6:61:6782:130154/1 AAATCTCACCACTTAACCCATACCAGACCAGACCCAAAAGGAAAGGCCGGGTTCAGTAACAACAACCTGGGTTCAA + DEFDIGHEAHDGFCCGGHHECAGHEFECH=HD>FFECCFHG>DEGCGGABGBCGEIDCFGGH:DF###### R2 read: line 19999997-20000000 @206B4ABXX100825:6:61:6782:130154/2 TTGTAGGGGTGTGATGCCGTGGACCCCTTCTTGAACCCCCAAGCTCGTCTTGCATTTGGGGCTCTAGCATGCAGCT + @AF@?EEFDB>B3<>FCD?BFBCGGGGFEHGEHE7GHHHHDEIHGE=>FECDE=AECCH/7?>DC@IH8DCFCFDC

The result showed the same length of R1 and R2 sequence also had error. So I think may be is the data problem. Then I awk the 1M fastq from line 19999997-20999996 as the test NA12891_test R1 and R2 file.

When I run the test file that just line 19999997-20999996 from original files and also include the @206B4ABXX100825:6:61:6782:130154/1 and /2 reads. There is normal and work well without any error.

So I am uncertain where is the problem. Any advice woulde be appreciated for me. Thanks again.

Shangqian

chapmanb commented 9 years ago

Shangqian; Thanks for including the full traceback, that is very helpful. This is due to a change in grabix, the tool we use for indexing fastq files when running in sections. You may have updated the code or tools separately, and this fix requires a simultaneous update. You can either fix by removing align_split_size and running individually, or getting the latest code and tools:

bcbio_nextgen.py upgrade -u development --tools

You may need to also remove alignprep/*.gbi to force the creation of new indexes. Hope this fixes the issue for you.

shang-qian commented 9 years ago

Brad, Many thanks for your former detail advice, the whole exome and genome are working in our HPC now. there are also two questions that need your help:

  1. There are 32 samples in my exome datasets. and I want to run bcbio with crossing multi nodes. In the documents 0.8.2 "bcbio_nextgen.py bcbio_sample.yaml -t ipython -n 12 -s lsf -q queue" can fix this problem. But I have a little confusion about the parameter -s and -q. Should I need to chang the lsf and queue in our cluster computer or just keep the default is ok?
  2. RNA pipeline error: "[2014-11-28 21:23] ../rnaseq/ref-transcripts.dexseq.gff3 was not found, so exon-level counting is being skipped." In the ../rnaseq/ folder, it just exists the ref-transcripts.dexseq.gff file , so how can I fix this problem. Does link the .gff3 file to .gff use code "ln -s .gff .gff3" right?

my yaml file is : details:

Thanks again for your helpful advice.

Best, Shangqian

roryk commented 9 years ago

Hi Shangqian,

Sorry about the DEXSeq issue; linking will fix it, our pre-built indices have the wrong extension.

For the scheduler and queue, on your HPC, is there a job scheduler that you submit your jobs to that distributes the jobs over the nodes? There are a bunch of different types of scheduler, LSF is one, there are others like SLURM and SGE. If you can find out what scheduler your HPC has running then you put that as the scheduler, and then the queue you are allowed to submit jobs to as the queue.

roryk commented 9 years ago

Shangqian,

I fixed this DEXSeq behavior so now it will find either .dexseq.gff or .dexseq.gff3 files here: 0e9c746f0a06.

shang-qian commented 9 years ago

Hi Roryk,

Thanks for your promptly response, It helps me so much. Thanks a lot.

best, Shangqian

shang-qian commented 9 years ago

RoryK,

The gff problem had been fixed,but the other issue existed. the error showed: [2014-12-02 11:12] multiprocessing: generate_transcript_counts Error in find.package("DEXSeq") : there is no package called 'DEXSeq'

So how can I to install DEXSeq packages in bcbio, Thanks.

roryk commented 9 years ago

Hi Shangqian,

Hm-- it should be getting installed automatically. If you fire up R and do:

source("http://bioconductor.org/biocLite.R")
biocLite("DEXSeq")

it should install it.

shang-qian commented 9 years ago

Hi Roryk,

I install DEXSeq in the node with R. But when I run bcbio, it also can't find this packages, I think it maybe package DEXSeq was not add into the bcbio running. Besides, I found the DEXSeq package is in the ./tools/lib/R/site-library, So I think I can use this packages by set R_LIBRARY_PATH. But it also existed the same error.
Can you show me how to add the DEXSeq package to R under the bcbio envirornment. Thanks.

Shangqian

roryk commented 9 years ago

Hi Shangqian,

I agree, that seems like it should work, thanks for helping to debug this. Hmm-- if you type:

Rscript -e 'find.package("DEXSeq")'

Does it output a directory or say the package cannot be found? If it works, does it work also with R_LIBRARY_PATH unset?

chapmanb commented 9 years ago

Shangqian; Apologies, @roryk and I traced this back to bcbio not injecting the installed site-libraries for R into the search path when looking for DEXSeq. I pushed a fix which does this, so if you upgrade to the latest development version:

bcbio_nextgen.py upgrade -u development 

it should hopefully work cleanly now. Thanks for the bug report and hope this fixes it for you.

shang-qian commented 9 years ago

Brad and Roryk, Thanks for the fix. I am upgrading the bcbio now.

By the way, in former test of whole genome SV, the bcbio was normal. But five days ago, I submitted a true lung cancer data to analysis SV, and the error happened today morning :

[2014-12-03 09:01] Index BAM file: 1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam [2014-12-03 09:01] Samtools-htslib-API: bam_index_build2() not yet implemented [2014-12-03 09:01] /bin/bash: line 1: 26699 Aborted /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai [2014-12-03 09:01] Index BAM file (single core): 1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam [2014-12-03 09:01] Samtools-htslib-API: bam_index_build2() not yet implemented [2014-12-03 09:01] /bin/bash: line 1: 26702 Aborted /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai [2014-12-03 09:01] Uncaught exception occurred Traceback (most recent call last): File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai Samtools-htslib-API: bam_index_build2() not yet implemented /bin/bash: line 1: 26702 Aborted /public/software/bcbio-nextgen/tools/bin/samtools index /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam /public/users/xieshangqian/project/LungC/bcbio_sv/work/bamprep/Lungtissue/7/tx/tmpdRaD64/1_2014-11-28_ceu-sort-7_78680799_94537032-prep.bam.bai ' returned non-zero exit status 134

I can't find the cause for this error, because the same command was normal for another bam file. Can you help me. Thanks.

Shangqian

chapmanb commented 9 years ago

Shangqian; Thanks for the report. The new version of samtools index does not support specifying the output of the .bam.bai file, which triggered this error. I'm confused as to why the code used samtools for indexing since it should use sambamba index by default, but perhaps there is something problematic about your sambamba install. Either way, I pushed a small fix to work around this issue so if you update it should hopefully work cleanly now. Thanks again.

shang-qian commented 9 years ago

Brad, I am upgrading bcbio, but it runs the [localhost] local: /public/software/bcbio-nextgen/tools/bin/brew info speedseq exits error: Fatal error: local() encountered an error (return code 1) while executing '/public/software/bcbio-nextgen/tools/bin/brew info speedseq'

I run 5 times ,every times has the same error.Can you check this? Thanks.

chapmanb commented 9 years ago

Shangqian; Sorry about the problem. I'm not sure why that command would fail. Does it provide any useful error messages if you run it outside of the upgrade process?

/public/software/bcbio-nextgen/tools/bin/brew info speedseq
shang-qian commented 9 years ago

Hi Brad, it takes the following messages: [root@compute-0-15 bin]# /public/software/bcbio-nextgen/tools/bin/brew info speedseq speedseq: stable 2014-08-22 https://github.com/cc2qe/speedseq /public/software/bcbio-nextgen/tools/Cellar/speedseq/2014-08-22 (4 files, 92K) * Built from source From: https://github.com/chapmanb/homebrew-cbl/blob/master/speedseq.rb ==> Dependencies Error: No available formula for sambamba

Is this problem caused by sambamba package, How can I fix this.

chapmanb commented 9 years ago

Shangqian; That's strange, it seems like your recipes are not getting updated since sambamba should be present in homebrew-science. This should happen automatically but you can run:

/public/software/bcbio-nextgen/tools/bin/brew update

which should pull it in. Hope this helps.

shang-qian commented 9 years ago

Bran, Thanks for your response, when I run the the relation code, it yields below error:

[root@compute-0-15 bin]# /public/software/bcbio-nextgen/tools/bin/brew update Unpacking objects: 100% (12/12), done. error: Your local changes to 'bedtools.rb' would be overwritten by merge. Aborting. Please, commit your changes or stash them before you can merge. Error: Failed to update tap: homebrew/science Already up-to-date.

should I rm homebrew/science and re-upgrade?

chapmanb commented 9 years ago

Shangqian; I'm not sure how the bedtools formula got changed manually but that explains the issues. You can fix with:

cd /public/software/bcbio-nextgen/tools/Library/Taps/homebrew/homebrew-science
git checkout bedtools.rb

then you should be able to re-run the updater and find everything working. Hope this helps figure it out.

shang-qian commented 9 years ago

Brad,

Thanks for your advice, I had upgraded the bcbio, and the dexseq is working now, but I test the exome pipeline ,under the command: [2014-12-07 20:54] java -Xms750m -Xmx2500m -Djava.io.tmpdir=/public/users/xieshangqian/Testcode/testdata/bcbio/work/ensemble/test/tmp -jar /public/software/bcbio-nextgen/tools/share/java/bcbio_variation/bcbio.variation-0.1.9-standalone.jar variant-ensemble /public/users/xieshangqian/Testcode/testdata/bcbio/work/ensemble/test/config/test-ensemble.yaml /public/software/bcbio-nextgen/genomes/Hsapiens/GRCh37/seq/GRCh37.fa /public/users/xieshangqian/Testcode/testdata/bcbio/work/ensemble/test/test-ensemble.vcf /public/users/xieshangqian/Testcode/testdata/bcbio/work/gatk-haplotype/test-effects-ploidyfix-combined-gatkclean.vcf.gz /public/users/xieshangqian/Testcode/testdata/bcbio/work/freebayes/test-effects-ploidyfix-filter.vcf.gz /public/users/xieshangqian/Testcode/testdata/bcbio/work/samtools/test-effects-ploidyfix-filter.vcf.gz

it yields the following info: [2014-12-07 20:59] Exception in thread "main" java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.lang.String [2014-12-07 20:59] at htsjdk.variant.variantcontext.CommonInfo.getAttributeAsInt(CommonInfo.java:242) [2014-12-07 20:59] at htsjdk.variant.variantcontext.VariantContext.getAttributeAsInt(VariantContext.java:703) [2014-12-07 20:59] at org.broadinstitute.gatk.utils.variant.GATKVariantContextUtils.simpleMerge(GATKVariantContextUtils.java:946) [2014-12-07 20:59] at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:309) [2014-12-07 20:59] at org.broadinstitute.gatk.tools.walkers.variantutils.CombineVariants.map(CombineVariants.java:117) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:267) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano$TraverseLociMap.apply(TraverseLociNano.java:255) [2014-12-07 20:59] at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:274) [2014-12-07 20:59] at org.broadinstitute.gatk.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:144) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:92) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.traversals.TraverseLociNano.traverse(TraverseLociNano.java:48) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:99) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:314) [2014-12-07 20:59] at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121) [2014-12-07 20:59] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248) [2014-12-07 20:59] at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155) [2014-12-07 20:59] at bcbio.run.broad$run_gatk$fn1805.invoke(broad.clj:34) [2014-12-07 20:59] at bcbio.run.broad$run_gatk.invoke(broad.clj:31) [2014-12-07 20:59] at bcbio.variation.combine$combine_variants.doInvoke(combine.clj:71) [2014-12-07 20:59] at clojure.lang.RestFn.invoke(RestFn.java:1557) [2014-12-07 20:59] at bcbio.variation.recall$get_min_merged.invoke(recall.clj:158) [2014-12-07 20:59] at bcbio.variation.recall$fn7040.invoke(recall.clj:173) [2014-12-07 20:59] at clojure.lang.MultiFn.invoke(MultiFn.java:249) [2014-12-07 20:59] at bcbio.variation.recall$create_merged$fn7045.invoke(recall.clj:187) [2014-12-07 20:59] at clojure.core$map$fn4207.invoke(core.clj:2487) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$map$fn4214.invoke(core.clj:2496) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$map$fn4207.invoke(core.clj:2479) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$map$fn4211.invoke(core.clj:2490) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$map$fn4207.invoke(core.clj:2479) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$map$fn4214.invoke(core.clj:2496) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$map$fn4207.invoke(core.clj:2479) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$reduce1.invoke(core.clj:890) [2014-12-07 20:59] at clojure.core$reverse.invoke(core.clj:904) [2014-12-07 20:59] at clojure.math.combinatorics$combinations.invoke(combinatorics.clj:73) [2014-12-07 20:59] at bcbio.variation.compare$variant_comparison_from_config$iter75827586$fn__7587.invoke(compare.clj:255) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.RT.seq(RT.java:484) [2014-12-07 20:59] at clojure.core$seq.invoke(core.clj:133) [2014-12-07 20:59] at clojure.core$tree_seq$walk4647$fn4648.invoke(core.clj:4475) [2014-12-07 20:59] at clojure.lang.LazySeq.sval(LazySeq.java:42) [2014-12-07 20:59] at clojure.lang.LazySeq.seq(LazySeq.java:60) [2014-12-07 20:59] at clojure.lang.LazySeq.more(LazySeq.java:96) [2014-12-07 20:59] at clojure.lang.RT.more(RT.java:607) [2014-12-07 20:59] at clojure.core$rest.invoke(core.clj:73) [2014-12-07 20:59] at clojure.core$flatten.invoke(core.clj:6478) [2014-12-07 20:59] at bcbio.variation.compare$variant_comparison_from_config.invoke(compare.clj:254) [2014-12-07 20:59] at bcbio.variation.ensemble$consensus_calls.invoke(ensemble.clj:113) [2014-12-07 20:59] at bcbio.variation.ensemble$_main.doInvoke(ensemble.clj:133) [2014-12-07 20:59] at clojure.lang.RestFn.applyTo(RestFn.java:137) [2014-12-07 20:59] at clojure.core$apply.invoke(core.clj:617) [2014-12-07 20:59] at bcbio.variation.core$_main.doInvoke(core.clj:35) [2014-12-07 20:59] at clojure.lang.RestFn.applyTo(RestFn.java:137) [2014-12-07 20:59] at bcbio.variation.core.main(Unknown Source)

and at least one day to do this command (now is still running this command). Before upgrading, I know it didn't need many time like this.so I think it maybe need fix. Is this command normal. Thanks again.

Shangqian

chapmanb commented 9 years ago

Shangqian; Sorry about the issue. This is from a problem with vcfallelicprimitives and multi-allele sites and was recently fixed in the development version. See more details here:

https://github.com/chapmanb/bcbio-nextgen/issues/679#issuecomment-65833923

If you upgrade with bcbio_nextgen.py upgrade -u development, remove the freebayes and checkpoints directories (rm -rf freebayes && rm -rf checkpoints_parallel), and re-run it should hopefully work cleanly. Hope this fixes it for you.

shang-qian commented 9 years ago

Thanks,Brad, It had upgraded and solved the problem. Thanks again.

shang-qian commented 9 years ago

Brad and Roryk, Many thanks for your big help.

Our HPC has 32 nodes(each node has 20 cores) and PBS scheduler is used for submitting mission. I formerly submitted a pbs file to run bcbio just under one node and it worked well. But now I need to analyse many samples in bcbio. So I think I can use the parallel for crossing multi nodes just like Roryk had told me. The following is my test pbs file for node13 and node17:

PBS -N exome_s10

PBS -j oe

PBS -l nodes=c13:ppn=3+c17:ppn=5

PBS -l walltime=5000:00:00

PBS -q high

cd ~/Testcode/testdata/bcbio/work/ bcbio_nextgen.py ../config/test_exome_single.yaml -t ipython -n 8 -s torque -q high

when I qsub this file, it yields error like this:

[2014-12-10 11:45] compute-0-13.local: Resource requests: bwa, sambamba, samtools; memory: 2.0; cores: 16, 1, 16 [2014-12-10 11:45] compute-0-13.local: Configuring 1 jobs to run, using 8 cores each with 16.2g of memory reserved for each job [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_config.py' [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_notebook_config.py' [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_nbconvert_config.py' [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcontroller_config.py' [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipengine_config.py' [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcluster_config.py' [ProfileCreate] Generating default config file: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/iplogger_config.py' 2014-12-10 11:45:36.491 [IPClusterStart] Config changed: 2014-12-10 11:45:36.491 [IPClusterStart] {'BcbioTORQUEEngineSetLauncher': {'mem': '16.2', 'cores': 8, 'tag': '', 'resources': ''}, 'IPClusterEngines': {'early_shutdown': 240}, 'Application': {'log_level': 10}, 'ProfileDir': {'location': u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython'}, 'BaseParallelApplication': {'log_to_file': True, 'cluster_id': u'e1bf1e39-9d63-4884-ba38-345be349dbd2'}, 'TORQUELauncher': {'queue': 'high'}, 'BcbioTORQUEControllerLauncher': {'mem': '16.2', 'cores': 2, 'tag': '', 'resources': ''}, 'IPClusterStart': {'delay': 10, 'n': 1, 'daemonize': True, 'engine_launcher_class': u'cluster_helper.cluster.BcbioTORQUEEngineSetLauncher', 'controller_launcher_class': u'cluster_helper.cluster.BcbioTORQUEControllerLauncher'}} 2014-12-10 11:45:36.503 [IPClusterStart] Using existing profile dir: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython' 2014-12-10 11:45:36.504 [IPClusterStart] Searching path [u'/public/users/xieshangqian/Testcode/testdata/bcbio/work', u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython'] for config files 2014-12-10 11:45:36.504 [IPClusterStart] Attempting to load config file: ipython_config.py 2014-12-10 11:45:36.505 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipython_config.py 2014-12-10 11:45:36.506 [IPClusterStart] Attempting to load config file: ipcluster_e1bf1e39_9d63_4884_ba38_345be349dbd2_config.py 2014-12-10 11:45:36.507 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcontroller_config.py 2014-12-10 11:45:36.507 [IPClusterStart] Attempting to load config file: ipcluster_e1bf1e39_9d63_4884_ba38_345be349dbd2_config.py 2014-12-10 11:45:36.508 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipengine_config.py 2014-12-10 11:45:36.509 [IPClusterStart] Attempting to load config file: ipcluster_e1bf1e39_9d63_4884_ba38_345be349dbd2_config.py 2014-12-10 11:45:36.510 [IPClusterStart] Loaded config file: /public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython/ipcluster_config.py 2014-12-10 12:01:09.032 [IPClusterStop] Using existing profile dir: u'/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython' 2014-12-10 12:01:09.094 [IPClusterStop] Stopping cluster [pid=21885] with [signal=2] Traceback (most recent call last): File "/public/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 216, in main(kwargs) File "/public/software/bcbio-nextgen/tools/bin/bcbio_nextgen.py", line 42, in main run_main(kwargs) File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 45, in run_main fc_dir, run_info_yaml) File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 81, in _run_toplevel for xs in pipeline.run(config, run_info_yaml, parallel, dirs, samples): File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 140, in run multiplier=alignprep.parallel_multiplier(samples)) as run_parallel: File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/prun.py", line 53, in start with ipython.create(parallel, dirs, config) as view: File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/contextlib.py", line 17, in enter return self.gen.next() File "/public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/cluster_helper/cluster.py", line 913, in cluster_view raise IOError("Cluster startup timed out.") IOError: Cluster startup timed out.

So, I have a little confusion, is my understand your advice right? Should I need to modify my PBS file or the bcbio system. By the way, the mission just works on node13 but not in node17 after I qsub pbs file. Thanks again for your kindly response.

Shangqian

roryk commented 9 years ago

Hi Shanqian,

Is your HPC busy? If you have to wait a long time to get a job, bcbio-nextgen will time out. When you tried to submit it, were the jobs pending for a long time or did they move to running status? If they are pending and bcbio-nextgen is timing out, you can have bcbio-nextgen wait for longer by adding --timeout time-in-minutes to the bcbio-nextgen command, so it won't time out while it is waiting. Hope that helps, let us know how it goes.

Best,

Rory

shang-qian commented 9 years ago

Hi Rory, I had checked our HPC nodes, they were all idle before submitting my job. Ok, I try to add the --timeout command. And if there is any problem, I also still need your help :) , Thanks. Best, Shangqian

roryk commented 9 years ago

Hi Shangqian,

If the nodes were idle then it might be an issue running on Torque. When you submit the job does everything get to the running state and it still times out, or are the jobs pending? If the jobs are in the running state but it still times out that would be very helpful to know.

shang-qian commented 9 years ago

Hi Rory, Jobs are in running state, and the same error happened.

roryk commented 9 years ago

Great, when the jobs are in the running state, is there a controller job and an engine job both running too? There should be three jobs running, one that is the bcbio_nextgen job in the script you wrote to submit to the scheduler. The other two should be a controller and a set of engines. Were all of those running, or just the one bcbio_nextgen job?

If the controller and engine jobs were running too, are there files that have engine and ipcontroller in them that are in your directory? I think if you look at those, you should see some errors talking about heartbeats between the engine and controller. Do you see something like that?

shang-qian commented 9 years ago

Thanks, Rory this is the qstat result: Job id Name User Time Use S Queue 11063.cluster exome_s10 xieshangqian 00:00:05 R high

and the top shows the "bcbio-nextgene.py" is runnning. How should I to check whether the controller or engines jobs is running or not.

roryk commented 9 years ago

They should be appearing on there if they are running, so it seems like they aren't starting. Are there engine and ipcontroller files in the directory? There should be job submission scripts for each of them.

shang-qian commented 9 years ago

yes, there are in the lib and pkgs folders /public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/sqlalchemy/engine /public/software/bcbio-nextgen/anaconda/lib/python2.7/site-packages/IPython/parallel/engine /public/software/bcbio-nextgen/anaconda/pkgs/sqlalchemy-0.9.7-py27_0/lib/python2.7/site-packages/sqlalchemy/engine /public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.1-py27_0/lib/python2.7/site-packages/IPython/parallel/engine /public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.2.0-py27_0/lib/python2.7/site-packages/IPython/parallel/engine /public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.0-py27_0/lib/python2.7/site-packages/IPython/parallel/engine

/public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.1-py27_0/bin/ipcontroller /public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.2.0-py27_0/bin/ipcontroller /public/software/bcbio-nextgen/anaconda/pkgs/ipython-2.3.0-py27_0/bin/ipcontroller /public/software/bcbio-nextgen/anaconda/bin/ipcontroller

roryk commented 9 years ago

Hm-- nothing in the work directory? They should look like torque_controller with a bunch of letters and numbers after them. If we can track down those files we can try to figure out why they aren't getting run.

shang-qian commented 9 years ago

ye, I had found them in the work directory. The contents are: Controller:

!/bin/sh

PBS -q high

PBS -V

PBS -N bcbio-c

PBS -j oe

PBS -l nodes=1:ppn=2

PBS -l walltime=239:00:00

cd $PBS_O_WORKDIR /public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from cluster_helper.cluster import VMFixIPControllerApp; VMFixIPControllerApp.launch_instance()' --ip=* --log-to-file --profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython" --cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669" --nodb --hwm=1 --scheme=leastload --HeartMonitor.max_heartmonitor_misses=120 --HeartMonitor.period=60000

Engines:

!/bin/sh

PBS -q high

PBS -V

PBS -j oe

PBS -N bcbio-e

PBS -t 1-1

PBS -l nodes=1:ppn=5

PBS -l mem=10444mb

PBS -l walltime=239:00:00

cd $PBS_O_WORKDIR /public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()' --timeout=960 --IPEngineApp.wait_for_url_file=960 --EngineFactory.max_heartbeat_misses=120 --profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython" --cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669"

chapmanb commented 9 years ago

Shangqian; Thanks for the help debugging. If you manually submit one of these:

qsub torque_controller*

does it provide any useful error messages? It sounds like something with the submission is problematic with your setup and maybe this will provide a clue. Thanks again.

lpantano commented 9 years ago

Hi,

I would try to submit one of this files. Normally they don't start because something is wrong with these files due to any configuration. If you submit one of this files alone, you will check if there is any error with them that we didn't think of.

It happened to me in one queue where you should submit a job with more than two cores, for instance. So the cluster manager will not get those jobs enter the queue and bcbio gets stuck.

On 12/10/2014 03:18 AM, shang-qian wrote:

ye, I had found them in the work directory. The contents are: Controller:

!/bin/sh

PBS -q high

PBS -V

PBS -N bcbio-c

PBS -j oe

PBS -l nodes=1:ppn=2

PBS -l walltime=239:00:00

cd $PBS_O_WORKDIR /public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from cluster_helper.cluster import VMFixIPControllerApp; VMFixIPControllerApp.launch_instance()' --ip=* --log-to-file --profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython" --cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669" --nodb --hwm=1 --scheme=leastload --HeartMonitor.max_heartmonitor_misses=120 --HeartMonitor.period=60000

Engines:

!/bin/sh

PBS -q high

PBS -V

PBS -j oe

PBS -N bcbio-e

PBS -t 1-1

PBS -l nodes=1:ppn=5

PBS -l mem=10444mb

PBS -l walltime=239:00:00

cd $PBS_O_WORKDIR /public/software/bcbio-nextgen/anaconda/bin/python2.7 -E -c 'import resource; cur_proc, max_proc = resource.getrlimit(resource.RLIMIT_NPROC); target_proc = min(max_proc, 10240) if max_proc > 0 else 10240; resource.setrlimit(resource.RLIMIT_NPROC, (max(cur_proc, target_proc), max_proc)); cur_hdls, max_hdls = resource.getrlimit(resource.RLIMIT_NOFILE); target_hdls = min(max_hdls, 10240) if max_hdls > 0 else 10240; resource.setrlimit(resource.RLIMIT_NOFILE, (max(cur_hdls, target_hdls), max_hdls)); from IPython.parallel.apps.ipengineapp import launch_new_instance; launch_new_instance()' --timeout=960 --IPEngineApp.wait_for_url_file=960 --EngineFactory.max_heartbeat_misses=120 --profile-dir="/public/users/xieshangqian/Testcode/testdata/bcbio/work/log/ipython" --cluster-id="3b8f9f6a-98f4-47d4-a9db-d3f15d4f3669"

— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/653#issuecomment-66417860.

shang-qian commented 9 years ago

Hi lpantano, Thanks for your trying, I had also qsub the torque_controller* file as Brad's advice, The result is similar to you. It is in the running state in one node with two cores, but the time use is always zero. And in fact, it doesn't work in the node that I qsub.

lpantano commented 9 years ago

Hi,

if you run it just a file with the same header but with another thing, just to make sure is only related to the cluster and not ipython or bcbio? and see the ouput files, and if it finishes...or something..

!/bin/sh

PBS -q high

PBS -V

PBS -N bcbio-c

PBS -j oe

PBS -l nodes=1:ppn=2

PBS -l walltime=239:00:00

cd /SOME/PATH/WITH/FILES sleep(10) ls

On 12/10/2014 09:57 AM, shang-qian wrote:

Hi lpantano, Thanks for your trying, I had also qsub the torque_controller* file as Brad's advice, The result is similar to you. It is in the running state in one node with two cores, but the time use is always zero. And in fact, it doesn't work in the node that I qsub.

— Reply to this email directly or view it on GitHub https://github.com/chapmanb/bcbio-nextgen/issues/653#issuecomment-66463040.