bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
994 stars 354 forks source link

Running variant calling without realignment crashes #337

Closed matanhofree closed 10 years ago

matanhofree commented 10 years ago

This is the last command in the log file:

[2014-03-06 11:58] java -Xms750m -Xmx2500m -Djava.io.tmpdir=/mnt/tmp/TCGA-CQ-5330/work_norealign/tmp/tmpmmV6dE -jar /cellar/users/mhofree/projects/cancer_ngs/external/ngs-tools/share/java/gatk/GenomeAnalysisTK.jar -T PrintReads -R /cellar/users/mhofree/projects/cancer_ngs/external/bcbio-nextgen/genomes/Hsapiens/hg19/seq/hg19.fa -I /mnt/tmp/TCGA-CQ-5330/work_norealign/bamclean/TCGA-CQ-5330-01A-01D-1683-08/C495.TCGA-CQ-5330-01A-01D-1683-08.2-reorder-fixrgs.bam --out /mnt/tmp/TCGA-CQ-5330/work_norealign/bamclean/TCGA-CQ-5330-01A-01D-1683-08/tx/tmpKTcvkE/C495.TCGA-CQ-5330-01A-01D-1683-08.2-reorder-fixrgs-gatkfilter.bam --filter_mismatching_base_and_quals -U LENIENT_VCF_PROCESSING --read_filter BadCigar --read_filter NotPrimaryAlignment

Crash trace is very long: http://chianti.ucsd.edu/~mhofree/share/run_error.log

It is possible I miss-configured something in the yaml file. I use the following settings:

details:
- algorithm:
    aligner: false
    bam_clean: picard
    bam_sort: coordinate
    coverage_depth: high
    coverage_interval: exome
    mark_duplicates: picard
    platform: illumina
    quality_format: standard
    realign: false
    recalibrate: false
    variantcaller: [ mutect, varscan, freebayes ]
  analysis: variant2
  description: TCGA-CQ-5330-01A-01D-1683-08
  files: /cancer_ngs/results/2014_02_02_ngs_node1/ramdrive/TCGA-CQ-5330/inputData/d50b34a2-37b9-47d3-a1a4-0b2928e4a266/C495.TCGA-CQ-5330-01A-01D-1683-08.2.bam
  genome_build: hg19
  metadata:
    batch: TCGA-CQ-5330
    phenotype: tumor
- algorithm:
    aligner: false
    bam_clean: picard
    bam_sort: coordinate
    coverage_depth: high
    coverage_interval: exome
    mark_duplicates: picard
    platform: illumina
    quality_format: standard
    realign: false
    recalibrate: false
    variantcaller: [ mutect, varscan, freebayes ]
  analysis: variant2
  description: TCGA-CQ-5330-10A-01D-1683-08
  files: /cancer_ngs/results/2014_02_02_ngs_node1/ramdrive/TCGA-CQ-5330/inputData/0a2f92be-8565-40a6-b377-2107b78af047/C495.TCGA-CQ-5330-10A-01D-1683-08.2.bam
  genome_build: hg19
  metadata:
    batch: TCGA-CQ-5330
    phenotype: normal
fc_date: 140303
fc_name: TCGA-CQ-5330_mutect
upload:
    dir: /cancer_ngs/results/2014_02_02_ngs_node1/TCGA-CQ-5330_final_norealign
chapmanb commented 10 years ago

Matan; Thanks for the problem report. It looks like your input alignment is against GRCh37 and you're trying to do an analysis against hg19 without aligning. That won't work, as we don't try to do anything like lift over the chromsome names or coordinates at the alignment step. The bam_clean step can handle out of order chromosome names but not swapping coordinates like this.

Your best bet it to align against hg19 if you want to call variants in that coordinate space. Hope this helps.