bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

getting errors running cancer samples: uninitialized value $sample #1152

Closed parlar closed 8 years ago

parlar commented 8 years ago

Hi, I'm getting errors from a run with both tumor-only and tumor-normal samples:

[2015-12-11T09:56Z] Use of uninitialized value $sample in concatenation (.) or string at /usr/local/share/bcbio/anaconda/bin/var2vcf_paired.pl line 35.

Has anyone seen this? Could be doing something wrong..


Config:

---
upload:
  dir: ../final
details:
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08N_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08N_R2.fastq.gz]
    description: 1-08_normal
    metadata:
      batch: 1-08
      phenotype: normal
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08T_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08T_R2.fastq.gz]
    description: 1-08_tumor
    metadata:
      batch: 1-08
      phenotype: tumor
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11N_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11N_R2.fastq.gz]
    description: 1-11_normal
    metadata:
      batch: 1-11
      phenotype: normal
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11T_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11T_R2.fastq.gz]
    description: 1-11_tumor
    metadata:
      batch: 1-11
      phenotype: tumor
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12N_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12N_R2.fastq.gz]
    description: 1-12_normal
    metadata:
      batch: 1-12
      phenotype: normal
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12T_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12T_R2.fastq.gz]
    description: 1-12_tumor
    metadata:
      batch: 1-12
      phenotype: tumor
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
chapmanb commented 8 years ago

Pär; Sorry about the issues. This is an unsatisfying approach, but if you re-run do you get reproducible errors at the same place. VarDict Java can error out due to memory or other Java randomness, resulting in truncated output. The warnings you're seeing from Perl are the post-processing scripts way of saying something is wrong with the output. This will often result in a bad VCF and then downstream failures from vcflib and other post-processing steps.

If you get it to fail consistently with a single thread (-n 1) and can post the traceback you see I can hopefully provide more useful feedback. Hope this helps some.

parlar commented 8 years ago

I've run using single thread (-n 1) but the problem persists.

I checked one of the problematic regions (1-12-6_0_138188616-raw-regions.bed, if I interpret the output correctly). The corresponding vcf output is empty (1-12-6_0_138188616-raw.vcf.gz). Is the problem due to a lack of calls?

output:

[2015-12-14T14:49Z] Timing: structural variation initial
[2015-12-14T14:49Z] Timing: hla typing
[2015-12-14T14:49Z] Resource requests: freebayes, gatk, mutect, picard, vardict, varscan; memory: 2.00, 3.50, 2.50, 3.50, 3.00, 2.00; cores: 16, 1, 1, 1, 1, 1
[2015-12-14T14:49Z] Configuring 1 jobs to run, using 1 cores each with 3.50g of memory reserved for each job
[2015-12-14T14:49Z] Timing: alignment post-processing
[2015-12-14T14:49Z] multiprocessing: piped_bamprep
[2015-12-14T14:49Z] Timing: variant calling
[2015-12-14T14:49Z] multiprocessing: variantcall_sample
[2015-12-14T14:49Z] Annotate with dbSNP
[2015-12-14T14:51Z] Annotate with dbSNP
[2015-12-14T14:53Z] Annotate with dbSNP
[2015-12-14T14:55Z] Annotate with dbSNP
[2015-12-14T14:57Z] Annotate with dbSNP
[2015-12-14T14:59Z] Annotate with dbSNP
[2015-12-14T15:01Z] Annotate with dbSNP
[2015-12-14T15:03Z] Annotate with dbSNP
[2015-12-14T15:05Z] Annotate with dbSNP
[2015-12-14T15:07Z] Annotate with dbSNP
[2015-12-14T15:09Z] Genotyping with VarDict: Inference
[2015-12-14T15:10Z] tabix index 1-12-3_27675063_138375174-raw.vcf.gz
[2015-12-14T15:10Z] Annotate with dbSNP
[2015-12-14T15:12Z] Genotyping with VarDict: Inference
[2015-12-14T15:12Z] tabix index 1-12-3_138376614_169940495-raw.vcf.gz
[2015-12-14T15:12Z] Annotate with dbSNP
[2015-12-14T15:14Z] Genotyping with VarDict: Inference
[2015-12-14T15:14Z] tabix index 1-12-3_169952932_193102603-raw.vcf.gz
[2015-12-14T15:14Z] Annotate with dbSNP
[2015-12-14T15:16Z] Genotyping with VarDict: Inference
[2015-12-14T15:16Z] Use of uninitialized value $sample in concatenation (.) or string at /usr/local/share/bcbio/anaconda/bin/var2vcf_paired.pl line 35.
[2015-12-14T15:16Z] tabix index 1-12-6_0_138188616-raw.vcf.gz
[2015-12-14T15:16Z] Annotate with dbSNP
[2015-12-14T15:18Z] Genotyping with VarDict: Inference
[2015-12-14T15:18Z] tabix index 1-12-6_138197101_171115067-raw.vcf.gz
[2015-12-14T15:18Z] Annotate with dbSNP
[2015-12-14T15:20Z] Genotyping with VarDict: Inference
[2015-12-14T15:20Z] tabix index 1-12-7_0_19157276-raw.vcf.gz
[2015-12-14T15:20Z] Annotate with dbSNP
[2015-12-14T15:22Z] Genotyping with VarDict: Inference
[2015-12-14T15:22Z] Use of uninitialized value $sample in concatenation (.) or string at /usr/local/share/bcbio/anaconda/bin/var2vcf_paired.pl line 35.
[2015-12-14T15:22Z] tabix index 1-12-7_100771760_159138663-raw.vcf.gz
[2015-12-14T15:22Z] Annotate with dbSNP
[2015-12-14T15:24Z] Genotyping with VarDict: Inference
[2015-12-14T15:24Z] Use of uninitialized value $sample in concatenation (.) or string at /usr/local/share/bcbio/anaconda/bin/var2vcf_paired.pl line 35.
[2015-12-14T15:24Z] tabix index 1-12-8_0_49830325-raw.vcf.gz
[2015-12-14T15:24Z] Annotate with dbSNP
[2015-12-14T15:26Z] Genotyping with VarDict: Inference
[2015-12-14T15:26Z] tabix index 1-12-8_49830661_70042762-raw.vcf.gz
[2015-12-14T15:26Z] Annotate with dbSNP

config:

---
upload:
  dir: ../final
details:
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08N_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08N_R2.fastq.gz]
    description: 1-08_normal
    metadata:
      batch: 1-08
      phenotype: normal
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08T_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-08T_R2.fastq.gz]
    description: 1-08_tumor
    metadata:
      batch: 1-08
      phenotype: tumor
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11N_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11N_R2.fastq.gz]
    description: 1-11_normal
    metadata:
      batch: 1-11
      phenotype: normal
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11T_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-11T_R2.fastq.gz]
    description: 1-11_tumor
    metadata:
      batch: 1-11
      phenotype: tumor
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12N_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12N_R2.fastq.gz]
    description: 1-12_normal
    metadata:
      batch: 1-12
      phenotype: normal
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
  - files: [/home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12T_R1.fastq.gz, /home/lindak/Data/LiquidBiopsies/TGFbetaTest/1-12T_R2.fastq.gz]
    description: 1-12_tumor
    metadata:
      batch: 1-12
      phenotype: tumor
    analysis: variant2
    genome_build: GRCh37
    algorithm:
      aligner: bwa
      mark_duplicates: false
      recalibrate: false
      realign: gatk
      variant_regions: /home/lindak/Data/LiquidBiopsies/TGFbetaTest/design30606_merge.bed
      platform: illumina
      quality_format: standard
      variantcaller: [mutect,freebayes,vardict,varscan]
      indelcaller: [scalpel, pindel, sid]
      adapters: [truseq]
parlar commented 8 years ago

Do you need data to reproduce the error?

chapmanb commented 8 years ago

Pär; From the log above, it doesn't look to be failing, just triggering warnings from the Perl script. Is that right? It might be reporting this error when getting empty output from VarDict (no calls in a region) which isn't anything to worry about. Are you finding failures later, or that VarDict is missing calls? If not, I wouldn't worry too much about some Perl errors as the VarDict perl scripts can be noisy on edge cases. Hope this helps some.

lpantano commented 8 years ago

Closing this since it seems it worked at the end. Let us know if you have further issues.

thanks so much