bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

0.7.8a: running multiple paired variant callers - only results from one caller end up copying into final/ folder #340

Closed mjafin closed 10 years ago

mjafin commented 10 years ago

Hi Brad, I combined FreeBayes and MuTect in paired variant calling, and the run was fine but only the FreeBayes vcf ended up being copied into the final folder. Any ideas what might be wrong?

lbeltrame commented 10 years ago

Can you check whether they're in the per-sample directories? I found them there.

chapmanb commented 10 years ago

Miika; I'm not sure, the unit tests seem to do the right thing, putting them in the sample directories at Luca mentions:

├── c-tumor
│   ├── c-tumor-freebayes.vcf.gz
│   ├── c-tumor-freebayes.vcf.gz.tbi
│   ├── c-tumor-mutect.vcf.gz
│   ├── c-tumor-mutect.vcf.gz.tbi
│   ├── c-tumor-varscan.vcf.gz
│   ├── c-tumor-varscan.vcf.gz.tbi
│   └── qc
└── c-tumor2
    ├── c-tumor2-freebayes.vcf.gz
    ├── c-tumor2-freebayes.vcf.gz.tbi
    ├── c-tumor2-mutect.vcf.gz
    ├── c-tumor2-mutect.vcf.gz.tbi
    ├── c-tumor2-varscan.vcf.gz
    ├── c-tumor2-varscan.vcf.gz.tbi
    └── qc

Maybe posting your sample YAML file will help with identifying the differences with the test data so I can reproduce and fix. Thanks much.

mjafin commented 10 years ago

Sorry about the slight delay in answering this, here's the config file:

details:
- algorithm:
    aligner: bwa
    background: /ngs/reference_data/genomes/Hsapiens/hg19/variation/refseq_exome_10bp_hg19_300_1kg_normal_panel.hg19.vcf
    coverage_depth: high
    coverage_interval: exome
    mark_duplicates: false
    platform: illumina
    quality_format: Standard
    realign: gatk
    recalibrate: false
    svcaller:
    - cn.mops
    variant_regions: /ngs/public_data/ERP002442/ERP002442-targeted_nonoverlap_hg19.bed
    variantcaller:
    - mutect
    - freebayes
  analysis: variant2
  description: 10-497-N
  files:
  - /ngs/public_data/ERP002442/ERR256785_1.fastq.gz
  - /ngs/public_data/ERP002442/ERR256785_2.fastq.gz
  genome_build: hg19
  metadata:
    batch: 10-497-
    phenotype: normal
- algorithm:
    aligner: bwa
    background: /ngs/reference_data/genomes/Hsapiens/hg19/variation/refseq_exome_10bp_hg19_300_1kg_normal_panel.hg19.vcf
    coverage_depth: high
    coverage_interval: exome
    mark_duplicates: false
    platform: illumina
    quality_format: Standard
    realign: gatk
    recalibrate: false
    svcaller:
    - cn.mops
    variant_regions: /ngs/public_data/ERP002442/ERP002442-targeted_nonoverlap_hg19.bed
    variantcaller:
    - mutect
    - freebayes
  analysis: variant2
  description: 10-497-T
  files:
  - /ngs/public_data/ERP002442/ERR256786_1.fastq.gz
  - /ngs/public_data/ERP002442/ERR256786_2.fastq.gz
  genome_build: hg19
  metadata:
    batch: 10-497-
    phenotype: tumor
fc_date: '2014-02-18'
fc_name: tumor-paired
upload:
  dir: ../final

Here's the output folder structure:

.
├── 10-497-N
│   ├── 10-497-N-ready.bam
│   ├── 10-497-N-ready.bam.bai
│   └── qc
│       ├── bamtools
│       │   ├── bamtools_stats.txt
│       │   └── tx
│       └── fastqc
│           ├── fastqc_data.txt
│           ├── fastqc_report.html
│           ├── Icons
│           │   ├── error.png
│           │   ├── fastqc_icon.png
│           │   ├── tick.png
│           │   └── warning.png
│           ├── Images
│           │   ├── duplication_levels.png
│           │   ├── kmer_profiles.png
│           │   ├── per_base_gc_content.png
│           │   ├── per_base_n_content.png
│           │   ├── per_base_quality.png
│           │   ├── per_base_sequence_content.png
│           │   ├── per_sequence_gc_content.png
│           │   ├── per_sequence_quality.png
│           │   └── sequence_length_distribution.png
│           └── summary.txt
├── 10-497-T
│   ├── 10-497-T-freebayes.vcf.gz
│   ├── 10-497-T-freebayes.vcf.gz.tbi
│   ├── 10-497-T-ready.bam
│   ├── 10-497-T-ready.bam.bai
│   └── qc
│       ├── bamtools
│       │   ├── bamtools_stats.txt
│       │   └── tx
│       └── fastqc
│           ├── fastqc_data.txt
│           ├── fastqc_report.html
│           ├── Icons
│           │   ├── error.png
│           │   ├── fastqc_icon.png
│           │   ├── tick.png
│           │   └── warning.png
│           ├── Images
│           │   ├── duplication_levels.png
│           │   ├── kmer_profiles.png
│           │   ├── per_base_gc_content.png
│           │   ├── per_base_n_content.png
│           │   ├── per_base_quality.png
│           │   ├── per_base_sequence_content.png
│           │   ├── per_sequence_gc_content.png
│           │   ├── per_sequence_quality.png
│           │   └── sequence_length_distribution.png
│           └── summary.txt
└── 2014-02-18_tumor-paired
    ├── programs.txt
    └── project-summary.yaml
mjafin commented 10 years ago

Ah, actually, it looks like something must've gone wrong with mutect+SID variant calling, as the mutect/ folder doesn't have the vcf files in there..

I'll rerun the data.

mjafin commented 10 years ago

Well, I started a run from scratch and here's what it does before finishing (no mention of mutect):

2014-03-10 15:57:33.475 [IPClusterStop] Stopping cluster [pid=28740] with [signal=2]
[2014-03-10 15:57] ukapdlnx115: Timing: finished
[2014-03-10 15:57] ukapdlnx115: Storing directory in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-T/qc
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-T/10-497-T-ready.bam
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-T/10-497-T-ready.bam.bai
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-T/10-497-T-freebayes.vcf.gz
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-T/10-497-T-freebayes.vcf.gz.tbi
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/2014-02-18_tumor-paired/programs.txt
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/2014-02-18_tumor-paired/project-summary.yaml
[2014-03-10 15:57] ukapdlnx115: Storing directory in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-N/qc
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-N/10-497-N-ready.bam
[2014-03-10 15:57] ukapdlnx115: Storing in local filesystem: /scratch/ukapd/klrl262/ERP002442/tumor-paired/final/10-497-N/10-497-N-ready.bam.bai

These guys are found in the mutect work folder:

[klrl262@ukapdlnx115: /scratch/ukapd/klrl262/ERP002442/tumor-paired/work ]$ l mutect/
total 1672
drwxr-xr-x 96 klrl262 modeller   8192 Mar 10 15:40 .
drwxr-xr-x 16 klrl262 modeller   4096 Mar 10 15:57 ..
-rw-r--r--  1 klrl262 modeller 217231 Mar 10 15:40 2_2014-02-18_tumor-paired-sort-variants-effects.vcf.gz
-rw-r--r--  1 klrl262 modeller  16311 Mar 10 15:40 2_2014-02-18_tumor-paired-sort-variants-effects.vcf.gz.tbi
-rw-r--r--  1 klrl262 modeller  35554 Mar 10 15:38 2_2014-02-18_tumor-paired-sort-variants.vcf-files.txt
-rw-r--r--  1 klrl262 modeller 130266 Mar 10 15:38 2_2014-02-18_tumor-paired-sort-variants.vcf.gz
-rw-r--r--  1 klrl262 modeller  15933 Mar 10 15:38 2_2014-02-18_tumor-paired-sort-variants.vcf.gz.tbi

These are from FreeBayes work folder:

-rw-r--r--  1 klrl262 modeller  457679 Mar 10 15:40 2_2014-02-18_tumor-paired-sort-variants-filter-effects.vcf.gz
-rw-r--r--  1 klrl262 modeller   14328 Mar 10 15:40 2_2014-02-18_tumor-paired-sort-variants-filter-effects.vcf.gz.tbi
-rw-r--r--  1 klrl262 modeller  393424 Mar 10 15:40 2_2014-02-18_tumor-paired-sort-variants-filter.vcf.gz
-rw-r--r--  1 klrl262 modeller   36304 Mar 10 15:38 2_2014-02-18_tumor-paired-sort-variants.vcf-files.txt
-rw-r--r--  1 klrl262 modeller 1161712 Mar 10 15:38 2_2014-02-18_tumor-paired-sort-variants.vcf.gz
-rw-r--r--  1 klrl262 modeller   18804 Mar 10 15:38 2_2014-02-18_tumor-paired-sort-variants.vcf.gz.tbi
chapmanb commented 10 years ago

Miika; Ugh, this one is driving me crazy because I can't reproduce at all with a small test set. Could you try running the multiple caller test in the test suite and see if it behaves correctly for you:

./run_tests.sh cancermulti
tree test_automated_output/upload

I'm not sure why your example is acting differently and just trying to isolate if it's some kind of system problem or something else. Sorry about the issue and hope this sheds some light.

mjafin commented 10 years ago

Thanks Brad, I'll run that first thing tomorrow.. Locked out of my account atm so can't work from home (the wife is delighted)

I was running this on the two ERP002442 samples

mjafin commented 10 years ago

The test runs fine.. So weird. I'll look into this further by adding some debugging markers in the code that generates the final folder.

chapmanb commented 10 years ago

Miika; Strange, I also can't reproduce with the cancer test dataset, using https://bcbio-nextgen.readthedocs.org/en/latest/contents/testing.html#cancer-tumor-normal with the only change adjusting variantcallers: [freebayes, mutect]. On a fresh run, the output directory looks like:

$ tree -L 2 ../final
../final
├── 2014-01-06_cancer
│   ├── batch1-freebayes.db
│   ├── batch1-mutect.db
│   ├── programs.txt
│   └── project-summary.yaml
├── ERR256785
│   ├── ERR256785-ready.bam
│   ├── ERR256785-ready.bam.bai
│   └── qc
└── ERR256786
    ├── ERR256786-freebayes.vcf.gz
    ├── ERR256786-freebayes.vcf.gz.tbi
    ├── ERR256786-mutect.vcf.gz
    ├── ERR256786-mutect.vcf.gz.tbi
    ├── ERR256786-ready.bam
    ├── ERR256786-ready.bam.bai
    └── qc

Sorry about the issues and all the back and forth. There must be something I'm missing to reproduce but I can't see it right now.

mjafin commented 10 years ago

OK so did some further testing. If I specify mutect, freebayes and cn.mops, I only get freebayes vcf in the final folder. However, if I drop cn.mops, both mutect and freebayes get handled correctly. I don't plan on using cn.mops at this point in time, but something must be going wrong with it.

chapmanb commented 10 years ago

Miika; To add a final point of confusion to this thread, I can't replicate this even if I have cn.mops added as well. So now I'm officially confused. Since cn.mops is still experimental and needs lots of validation I'll leave this for now and we can return to it if we see the issue in the future. Sorry again about the issues and my inability to reproduce.