bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
992 stars 354 forks source link

Cannot reproduce run's (in bcbio_vm) same configuration file in bcbio_nextgen.py #1363

Closed mortunco closed 8 years ago

mortunco commented 8 years ago

Hi,

I am really confused the fact that my run gets failed. I ran exactly two configuration files(the only change was the path's of the files), one in bcbio_nextgen, one in bcbio-vm. and I got interruped. In bcbio-nextgen I got the same error that I got before (https://github.com/AstraZeneca-NGS/VarDictJava/issues/34)( @mjafin ). But I dont get why it causes error since bcbio vm proved that it can run successfully without realigning, cleaning and sorting.

I am willing to ANYTHING to solve this issue and have reproducible runs.

Thank you for your time and patience (again).

Best,

Tunc.

bcbio_nextgen.py configuration file


#Cancer tumor/normal calling evaluation using synthetic dataset 3
# from the ICGC-TCGA DREAM challenge:
# https://www.synapse.org/#!Synapse:syn312572/wiki/62018

---
details:
- algorithm:
    aligner: false
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    mark_duplicates: false
    quality_format: illumina
    variantcaller: [mutect, vardict, varscan, freebayes]
    indelcaller: scalpel
    ensemble: 
      numpass: 2    
  analysis: variant2
  description: 140e5014-bdd6-4663-9404-234c7f9e927d 

  files:
    - /home/ec2-user/puppy/icgc_data/try1/real_deneme/input/normal.bam   
  genome_build: GRCh37
  metadata:
    batch: ICGC
    phenotype: normal
- algorithm:
    aligner: false
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    mark_duplicates: false
    quality_format: illumina
    variantcaller: [mutect, vardict, varscan, freebayes] 
    indelcaller: scalpel
    ensemble:
      numpass: 2
  analysis: variant2
  description: a9ec7d9e-b179-4782-a589-43c7d1642be9 

  files:
    - /home/ec2-user/puppy/icgc_data/try1/real_deneme/input/tumor.bam
  genome_build: GRCh37
  metadata:
    batch: ICGC
    phenotype: tumor
fc_date: '2015-04-25'
fc_name: ICGC-trials
resources:
  gatk:
    jar: /home/ec2-user/GATK/GenomeAnalysisTK.jar 
  mutect: 
    jar: /home/ec2-user/GATK/mutect-1.1.7.jar
upload:
  dir: /home/ec2-user/puppy/icgc_data/try1/real_deneme/final/

bcbio_vm.py. This configuration finalised successfully.

[ec2-user@ip-172-31-55-174 config]$ cat deneme9.yaml 
#Cancer tumor/normal calling evaluation using synthetic dataset 3
# from the ICGC-TCGA DREAM challenge:
# https://www.synapse.org/#!Synapse:syn312572/wiki/62018

---
details:
- algorithm:
    aligner: false
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    quality_format: illumina
    variantcaller: [mutect, vardict, varscan, freebayes]
    indelcaller: scalpel
    ensemble: 
      numpass: 2    
  analysis: variant2
  description: 140e5014-bdd6-4663-9404-234c7f9e927d

  files:
    - s3://tuncproject/icgcrun/input/normal.bam   
  genome_build: GRCh37
  metadata:
    batch: ICGC
    phenotype: normal
- algorithm:
    aligner: false
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    quality_format: illumina
    variantcaller: [mutect, vardict, varscan, freebayes] 
    indelcaller: scalpel
    ensemble:
      numpass: 2
  analysis: variant2
  description: a9ec7d9e-b179-4782-a589-43c7d1642be9

  files:
    - s3://tuncproject/icgcrun/input/tumor.bam
  genome_build: GRCh37
  metadata:
    batch: ICGC
    phenotype: tumor
fc_date: '2015-04-14'
fc_name: ICGC-trials
resources:
  gatk:
    jar: s3://tuncproject/gatktools/GenomeAnalysisTK.jar 
  mutect: 
    jar: s3://tuncproject/gatktools/mutect-1.1.7.jar
upload:
  dir: s3://tuncproject/icgcrun/final/

The error that I got from the last run.

[ec2-user@ip-172-31-55-174 log]$ cat bcbio-nextgen.log 
[2016-04-25T12:26Z] Timing: organize samples
[2016-04-25T12:26Z] multiprocessing: organize_samples
[2016-04-25T12:26Z] Using input YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T12:26Z] Checking sample YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T12:27Z] Timing: organize samples
[2016-04-25T12:30Z] Timing: organize samples
[2016-04-25T12:30Z] multiprocessing: organize_samples
[2016-04-25T12:30Z] Using input YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T12:30Z] Checking sample YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T12:30Z] Testing minimum versions of installed programs
[2016-04-25T12:30Z] Timing: alignment preparation
[2016-04-25T12:30Z] multiprocessing: prep_align_inputs
[2016-04-25T12:30Z] multiprocessing: disambiguate_split
[2016-04-25T12:30Z] Timing: alignment
[2016-04-25T12:30Z] multiprocessing: process_alignment
[2016-04-25T15:28Z] Timing: organize samples
[2016-04-25T15:28Z] multiprocessing: organize_samples
[2016-04-25T15:28Z] Using input YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T15:28Z] Checking sample YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T15:28Z] Testing minimum versions of installed programs
[2016-04-25T15:28Z] Timing: alignment preparation
[2016-04-25T15:28Z] multiprocessing: prep_align_inputs
[2016-04-25T15:28Z] multiprocessing: disambiguate_split
[2016-04-25T15:28Z] Timing: alignment
[2016-04-25T15:28Z] multiprocessing: process_alignment
[2016-04-25T15:40Z] Timing: organize samples
[2016-04-25T15:40Z] multiprocessing: organize_samples
[2016-04-25T15:40Z] Using input YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T15:40Z] Checking sample YAML configuration: /home/ec2-user/puppy/icgc_data/try1/real_deneme/config/deneme10.yaml
[2016-04-25T15:40Z] Testing minimum versions of installed programs
[2016-04-25T15:40Z] Timing: alignment preparation
[2016-04-25T15:40Z] multiprocessing: prep_align_inputs
[2016-04-25T15:40Z] multiprocessing: disambiguate_split
[2016-04-25T15:40Z] Timing: alignment
[2016-04-25T15:40Z] multiprocessing: process_alignment
[2016-04-25T15:40Z] Timing: callable regions
[2016-04-25T15:40Z] multiprocessing: prep_samples
[2016-04-25T15:40Z] multiprocessing: postprocess_alignment
[2016-04-25T15:40Z] multiprocessing: calc_callable_loci
[2016-04-25T16:06Z] multiprocessing: combine_bed
[2016-04-25T16:06Z] Assigned coverage as 'genome' with 91.1% genome coverage and 0.0% offtarget coverage
[2016-04-25T16:06Z] multiprocessing: calc_callable_loci
[2016-04-25T16:43Z] multiprocessing: combine_bed
[2016-04-25T16:43Z] Assigned coverage as 'genome' with 91.3% genome coverage and 0.0% offtarget coverage
[2016-04-25T16:43Z] multiprocessing: combine_sample_regions
[2016-04-25T16:43Z] Identified 247 parallel analysis blocks
Block sizes:
  min: 4262
  5%: 37694.8
  25%: 1496568.0
  median: 16022838.0
  75%: 17708240.0
  95%: 22403890.6
  99%: 31174781.88
  max: 42703408
Between block sizes:
  min: 254
  5%: 270.0
  25%: 305.0
  median: 540.0
  75%: 1074.5
  95%: 31645.4
  99%: 99999.0
  max: 150000

[2016-04-25T16:43Z] Timing: structural variation initial
[2016-04-25T16:43Z] Timing: hla typing
[2016-04-25T16:43Z] Timing: alignment post-processing
[2016-04-25T16:43Z] multiprocessing: piped_bamprep
[2016-04-25T16:43Z] Timing: variant calling
[2016-04-25T16:43Z] multiprocessing: variantcall_sample
[2016-04-25T16:46Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpYQGDby/ICGC-1_246138162_249250621-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_246138162_249250621.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 14697 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpYQGDby/ICGC-1_246138162_249250621-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_246138162_249250621.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:56Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpDzzYcY/ICGC-1_108775956_142731023-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_108775956_142731023.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 14891 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpDzzYcY/ICGC-1_108775956_142731023-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_108775956_142731023.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:58Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpWo1wSV/ICGC-1_213085882_228744509-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_213085882_228744509.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15050 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpWo1wSV/ICGC-1_213085882_228744509-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_213085882_228744509.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:58Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpd0dmUn/ICGC-1_142781022_158867533-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_142781022_158867533.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15207 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpd0dmUn/ICGC-1_142781022_158867533-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_142781022_158867533.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:59Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpodeWTe/ICGC-1_75848032_91925009-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_75848032_91925009.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15365 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpodeWTe/ICGC-1_75848032_91925009-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_75848032_91925009.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:59Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpD9cDAr/ICGC-1_91925966_108772519-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_91925966_108772519.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15523 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmpD9cDAr/ICGC-1_91925966_108772519-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_91925966_108772519.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:59Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/tx/tmpQ1ADp7/ICGC-2_16329723_33092206-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/ICGC-2_16329723_33092206.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15681 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/tx/tmpQ1ADp7/ICGC-2_16329723_33092206-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/ICGC-2_16329723_33092206.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:59Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/tx/tmpfblddQ/ICGC-2_0_16279725-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/ICGC-2_0_16279725.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15840 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/tx/tmpfblddQ/ICGC-2_0_16279725-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/2/ICGC-2_0_16279725.vcf.gz
' returned non-zero exit status 139
[2016-04-25T16:59Z] Uncaught exception occurred
Traceback (most recent call last):
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 21, in run
    _do_run(cmd, checks, log_stdout)
  File "/usr/local/share/bcbio/anaconda/lib/python2.7/site-packages/bcbio/provenance/do.py", line 95, in _do_run
    raise subprocess.CalledProcessError(exitcode, error_msg)
CalledProcessError: Command 'set -o pipefail; bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmphN1aND/ICGC-1_0_16985854-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_0_16985854.vcf.gz
[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'
/bin/bash: line 1: 15889 Segmentation fault      bcftools annotate -c ID -a ../variation/dbsnp_138.vcf.gz -o /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/tx/tmphN1aND/ICGC-1_0_16985854-wdbsnp.vcf.gz -O z /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/vardict/1/ICGC-1_0_16985854.vcf.gz
' returned non-zero exit status 139

Also, I did all the options related to re adjustment false ( realign, bam_clean, bam sort, mark_duplicates etc.. ) but I still see that bammarkduplicates command in the bcbio-nextgen-comands.log. Am I supposed to see that or is it irrelevant and it is required as the part of variant calling.

/usr/local/share/bcbio/galaxy/../anaconda/bin/bammarkduplicates tmpfile=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/tx/tmpf7d8c8/normal-dedup-markdup markthreads=16 I=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/140e5014-bdd6-4663-9404-234c7f9e927d/normal.bam O=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/140e5014-bdd6-4663-9404-234c7f9e927d/tx/tmpGsXgUz/normal-dedup.bam
[2016-04-25T14:31Z] /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba index -t 16 /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/140e5014-bdd6-4663-9404-234c7f9e927d/tx/tmpNtsbra/normal-dedup.bam
[2016-04-25T14:41Z] /usr/local/share/bcbio/galaxy/../anaconda/bin/bammarkduplicates tmpfile=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/tx/tmpW737sh/tumor-dedup-markdup markthreads=16 I=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/a9ec7d9e-b179-4782-a589-43c7d1642be9/tumor.bam O=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/a9ec7d9e-b179-4782-a589-43c7d1642be9/tx/tmpy6069a/tumor-dedup.bam
[2016-04-25T15:28Z] /usr/local/share/bcbio/galaxy/../anaconda/bin/bammarkduplicates tmpfile=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/tx/tmpa0VVU7/tumor-dedup-markdup markthreads=16 I=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/a9ec7d9e-b179-4782-a589-43c7d1642be9/tumor.bam O=/home/ec2-user/puppy/icgc_data/try1/real_deneme/work/prealign/a9ec7d9e-b179-4782-a589-43c7d1642be9/tx/tmpUCAvMM/tumor-dedup.bam
[2016-04-25T15:40Z] /usr/local/share/bcbio/galaxy/../anaconda/bin/sambamba view -F 'mapping_quality > 0' -L /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/align/140e5014-bdd6-4663-9404-234c7f9e927d/normal-callable-split/normal-2-callable-coverageregions.bed -f bam -l 1 /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/align/140e5014-bdd6-4663-9404-234c7f9e927d/normal.bam | /usr/local/share/bcbio/galaxy/../anaconda/bin/bedtools genomecov -split -ibam stdin -bga -g /usr/local/share/bcbio/genomes/Hsapiens/GRCh37/seq/GRCh37.fa.fai > /home/ec2-user/puppy/icgc_data/try1/real_deneme/work/align/140e5014-bdd6-4663-9404-234c7f9e927d/normal-callable-split/tx/tmpSOtUoY/normal-2-callable-genomecov.bed
...
chapmanb commented 8 years ago

Tunc; Sorry about the problems. This is a different problem than you saw previously and the informative error is:

[E::hts_open_format] fail to open file '../variation/dbsnp_138.vcf.gz'

For some reason /path/to/bcbio/genomes/Hsapiens/GRCh37/dbsnp_138.vcf.gz is missing from your install. I'm not sure if the install failed or this file was deleted. You could try re-running the data install:

bcbio_nextgen.py upgrade --data

to download and install it. Hope this helps.

mortunco commented 8 years ago

Brad;

Thank you for the help. You were right, my dbsnp_138 were missing. In the docker version, the instillation was finalises successfully most of the time. What should be my method to check if I installed bcbio correct? Is there any method that I can do other than "run tests" in the documentation? Is that method valid ?

mortunco commented 8 years ago

Dear Brad;

I am getting this error about mutect directory ? (based on the most recent one). I believe I understand my problem this time but I may need your help about the solution. I started the download by specifying a location of GATK and mutect 1.1.7. but, I also specified their locations with bcbio_nextgen.py upgrade --tool --toolsplus mutect=/path/to/mutectANDgatk/jars and I think it created a collision. But even though, command-outed resources: lines in the configuration, it does not work and give the same error that I have obtained before.

Also, upgrade option of bcbio might have problems because it responds well to the GATK.jar but not to mutect.1.1.7 jar?? Is this an expected behavior ? Should mutect.jar location be updated when I run bcbio_nextgen.py --upgrade command ?

For possible anwers: I have run bcbio_nextgen.py upgrade tools and data a lot of times.

My log files dont contain this error. This is the stdout of the bcbio_nextgen.py process. Sorry I had to share with attachment otherwise, I got an error related exceeding maximum character limit. bcbio-nextgen.log.txt

This is the configuration that I edit? GATK path is right but mutect stays unchanged.

[ec2-user@ip-172-31-55-174 ~]$ cat /usr/local/share/bcbio/galaxy/bcbio_system.yaml
galaxy_config: universe_wsgi.ini
resources:
  bwa:
    cmd: bwa
    cores: 16
  cufflinks:
    cores: 16
    memory: 3g
  default:
    cores: 16
    jvm_opts:
    - -Xms750m
    - -Xmx2000m
    memory: 2G
  dexseq:
    memory: 10g
  express:
    memory: 8g
  gatk:
    dir: /usr/local/share/bcbio/toolplus/gatk/3.5-0-g36282e4
    jvm_opts:
    - -Xms500m
    - -Xmx3500m
  hisat2:
    cores: 16
    memory: 2G
  macs2:
    cores: 1
    memory: 8g
  miraligner:
    jvm_opts:
    - -Xms750m
    - -Xmx4500m
  oncofuse:
    jvm_opts:
    - -Xms750m
    - -Xmx2000m
  picard:
    jvm_opts:
    - -Xms750m
    - -Xmx3500m
  qualimap:
    memory: 4g
  sailfish:
    cores: 16
    memory: 1g
  samtools:
    cores: 16
    memory: 2G
  seqcluster:
    memory: 8g
  snap:
    cores: 16
    memory: 4G
  snpeff:
    jvm_opts:
    - -Xms750m
    - -Xmx6g
  star:
    cores: 16
    memory: 2g
  stringtie:
    cores: 16
    memory: 1g
  vardict:
    jvm_opts:
    - -Xms750m
    - -Xmx3000m
  wham:
    memory: 3500m
chapmanb commented 8 years ago

Tunc; For the install testing, you'll need to run a real pipeline to evaluate the install. The tests use a minimal genome directory because running against a full genome is too intensive. We rely on identifying errors during install as the best way to identify if everything worked correctly.

Regarding the MuTect problem, it doesn't look the install command worked correctly as I don't see a mutect section in your input file. MuTect is a separate jar from GATK. What command exactly did you run to install it? From your example above you want to point at the jar files, not directories with jar files:

http://bcbio-nextgen.readthedocs.org/en/latest/contents/installation.html#gatk-and-mutect-mutect2

Hope this helps.

mortunco commented 8 years ago

Brad;

I initiated a cancer-variant example to check if the system is ok.

[ec2-user@ip-172-31-55-174 ~]$ ls
bcbio_nextgen_install.py  GATK  puppy  tmp
[ec2-user@ip-172-31-55-174 ~]$ ls GATK/
GenomeAnalysisTK.jar  mutect-1.1.7.jar

I used the following command as stated in the documentation. bcbio_nextgen.py upgrade --tools --toolplus mutect=/home/ec2-user/GATK/mutect-1.1.7.jar bcbio_nextgen.py upgrade --tools --toolplus gatk=/home/ec2-user/GATK/GenomeAnalysisTK.jar

Thank you,

T.

chapmanb commented 8 years ago

Tunc; Thanks much for the details, this helps a lot. Apologies, this was a bug in installing these custom jars -- if the mutect block was not already present in the original configuration it would fail to add the new installed directory. If you update to the latest development and retry it should now work correctly:

bcbio_nexgen.py upgrade -u development
bcbio_nextgen.py upgrade --tools --toolplus mutect=/home/ec2-user/GATK/mutect-1.1.7.jar

Thanks much for the report and hope this gets your analysis running.

mortunco commented 8 years ago

Brad;

Thank you very much for the patience. Now, it solved my problem. You made me the happiest man on earth.

Little question;

Since you released it as development, will I have to download this specific option while installing to our HPC in my university? or Can we go with the method?

Thank you very very much ! Thank you thank you

Best, T.

wget https://raw.github.com/chapmanb/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
python bcbio_nextgen_install.py /usr/local/share/bcbio --tooldir=/usr/local \
  --genomes GRCh37 --aligners bwa --aligners bowtie2
chapmanb commented 8 years ago

Tunc; Glad to help. You will be to add -u development to installs or updates to get this currently. We'll plan to have a new release with these fixes soon. Hope this helps.