bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
985 stars 353 forks source link

Tumor/normal pipeline problem #1129

Closed jacobhurst closed 8 years ago

jacobhurst commented 8 years ago

Hi, I added structural variant detection to the tumor/normal pipeline and then run initially with n=10, here is the output below but run with n=1, any suggestions about how to improve the debugging of this would be gratefully received. The yaml I used to configure the pipeline is below as well.

[2015-12-01T14:26Z] Timing: structural variation initial [2015-12-01T14:26Z] multiprocessing: detect_sv [2015-12-01T14:26Z] multiprocessing: finalize_sv [2015-12-01T14:26Z] Resource requests: freebayes, gatk, mutect, picard, vardict, varscan; memory: 2.00, 3.50, 2.50, 3.50, 3.00, 2.00; cores: 16, 1, 1, 1, 1, 1 [2015-12-01T14:26Z] Configuring 1 jobs to run, using 1 cores each with 3.50g of memory reserved for each job [2015-12-01T14:26Z] Timing: alignment post-processing [2015-12-01T14:26Z] multiprocessing: piped_bamprep [2015-12-01T14:26Z] Timing: variant calling [2015-12-01T14:26Z] multiprocessing: variantcall_sample [2015-12-01T14:26Z] Genotyping with VarDict: Inference [2015-12-01T14:26Z] /usr/bin/env: Rscript: No such file or directory [2015-12-01T14:26Z] Use of uninitialized value $sample in concatenation (.) or string at /scratch/DBC/ATRES/jhurst/tools/bcbio/bin/var2vcf_paired.pl line 35. [2015-12-01T14:30Z] Uncaught exception occurred Traceback (most recent call last): File "/scratch/DBC/ATRES/jhurst/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run _do_run(cmd, checks, log_stdout) File "/scratch/DBC/ATRES/jhurst/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run raise subprocess.CalledProcessError(exitcode, error_msg) CalledProcessError: Command 'set -o pipefail; export VAR_DICT_OPTS='-Xms750m -Xmx3000m -XX:+UseSerialGC -Djava.io.tmpdir=/scratch/DBC/ATRES/jhurst/TCGA_data/TCGA-LL-A73Y/work/vardict/1/tx/tmppjmA_w' && vardict-java -G /scratch/DBC/ATRES/jhurst/tools/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa -f 0.1 -N LLA73Y-tumor -b "/scratch/DBC/ATRES/jhurst/TCGA_data/TCGA-LL-A73Y/work/bamprep/LLA73Y-tumor/1/2_2015-11-27_TCGA-LLA73Y-WXS-sort-1_0_31187470-prep.bam|/scratch/DBC/ATRES/jhurst/TCGA_data/TCGA-LL-A73Y/work/bamprep/LLA73Y-normal/1/1_2015-11-27_TCGA-LLA73Y-WXS-sort-1_0_31187470-prep.bam" -c 1 -S 2 -E 3 -g 4 /scratch/DBC/ATRES/jhurst/TCGA_data/TCGA-LL-A73Y/work/vardict/1/LLA73Y-1_0_31187470-raw-regions-regionlimit.bed | testsomatic.R | var2vcf_paired.pl -M -P 0.9 -m 4.25 -f 0.1 -N "LLA73Y-tumor|LLA73Y-normal" | bcftools filter -m '+' -s 'REJECT' -e 'STATUS !~ "._Somatic"' 2> /dev/null | /scratch/DBC/ATRES/jhurst/tools/bcbio/anaconda/bin/py -x 'bcbio.variation.vardict.depth_freq_filter(x, 0, "bwa")' | sed 's/.Somatic\/Somatic/' | sed 's/REJECT,Description=".">/REJECT,Description="Not Somatic via VarDict">/' | /scratch/DBC/ATRES/jhurst/tools/bcbio/anaconda/bin/py -x 'bcbio.variation.freebayes.call_somatic(x)' | awk -F$'\t' -v OFS='\t' '{if ($0 !~ /^#/) gsub(/[KMRYSWBVHDX]/, "N", $4) } {print}' | awk -F$'\t' -v OFS='\t' '$1!~/^#/ && $4 == $5 {next} {print}' | /scratch/DBC/ATRES/jhurst/tools/bcbio/bin/vcfstreamsort | bgzip -c > /scratch/DBC/ATRES/jhurst/TCGA_data/TCGA-LL-A73Y/work/vardict/1/tx/tmppjmA_w/LLA73Y-1_0_31187470-raw.vcf.gz /usr/bin/env: Rscript: No such file or directory Use of uninitialized value $sample in concatenation (.) or string at /scratch/DBC/ATRES/jhurst/tools/bcbio/bin/var2vcf_paired.pl line 35. ' returned non-zero exit status 127 Traceback (most recent call last): File "/scratch/DBC/ATRES/jhurst/tools/bcbio/bin/bcbio_nextgen.py", line 226, in main(__kwargs) File "/scratch/DBC/ATRES/jhurst/tools/bcbio/bin/bcbio_nextgen.py", line 43, in main runmain(*kwargs) File "/scratch/DBC/ATRES/jhurst/tools/bcbio/anaconda/lib/python2.7/site-packages/bcbio/pipeline/main.py", line 37, in run_main fc_dir, run_info_yaml)

genome_build: GRCh37 metadata: batch: LLA73Y phenotype: normal

chapmanb commented 8 years ago

Jacob; Sorry about the problem. The VarDict R code isn't finding the path to Rscript. This is a bug that is fixed in the latest development if you update with:

bcbio_nextgen.py upgrade -u development

We're planning for a new release with this fix and other improvements soon. If you don't want to upgrade you can make the bcbio installed Rscript (/scratch/DBC/ATRES/jhurst/tools/bcbio/anaconda/bin/Rscript) available on your PATH via symlinking.

Hope this gets things running cleanly for you.