bcbio / bcbio-nextgen-vm

Run bcbio-nextgen genomic sequencing analyses using isolated containers and virtual machines
MIT License
65 stars 17 forks source link

Indel Calling Scalpel-Pindel No #146

Closed mortunco closed 8 years ago

mortunco commented 8 years ago

Hi Brad;

I believe I am one step away for the implementing scalpel + variants callers to the Icgc Data. I am gonna do this !! :) .On my latest run, I tried to include pindel and scalpel (on separate runs). In both of them, even though, they did create a structural directory which is consisting two directories for normal and tumor. But these directories are empty.

If you could look at my configuration file i commanded out validation steps of the variant called because i thought maybe they are not called as an indel because they stucked in the validation phase. I get the same empty structural directory with to without these validation regions.

What should I do to get my indel called? Is this problem related with my configuration file. ( if so I apologize to bother you) Also, say if I want to call only indels, Should I just make the variant calling: false and indelcaller: scalpel ?

Thank you very much for your help.

Best regards,

Tunc;.

My ls of work.

ubuntu@frontend001:/encrypted/project7/work$ ls -la
total 112
drwxrwxr-x 22 ubuntu ubuntu 4096 Apr 12 06:09 .
drwxrwxr-x  4 ubuntu ubuntu 4096 Apr 11 20:13 ..
drwxr-xr-x  4 ubuntu ubuntu 4096 Apr 11 16:01 align
drwxr-xr-x  3 ubuntu ubuntu 4096 Apr 11 14:47 align_prep
-rw-rw-r--  1 ubuntu ubuntu 1697 Apr 11 20:13 bcbio_sample-forvm.yaml
-rw-r--r--  1 ubuntu ubuntu  985 Apr 11 20:14 bcbio_system-forvm-merged.yaml
-rw-rw-r--  1 ubuntu ubuntu  985 Apr 11 20:13 bcbio_system-forvm.yaml
-rw-r--r--  1 ubuntu ubuntu  960 Apr 11 20:14 bcbio_system-merged.yaml
drwxr-xr-x  3 ubuntu ubuntu 4096 Apr 11 20:21 bedprep
drwxr-xr-x  2 ubuntu ubuntu 4096 Apr 12 06:20 checkpoints_parallel
drwxrwxr-x  2 ubuntu ubuntu 4096 Apr 11 20:13 config
drwxr-xr-x  3 ubuntu ubuntu 4096 Apr 12 05:33 ensemble
drwxr-xr-x 87 ubuntu ubuntu 4096 Apr 12 06:00 freebayes
drwxr-xr-x  3 ubuntu ubuntu 4096 Apr 12 05:50 gemini
drwxr-xr-x  5 ubuntu ubuntu 4096 Apr 11 20:16 inputs
drwxrwxr-x  2 ubuntu ubuntu 4096 Apr 11 20:13 log
drwx------  3 ubuntu ubuntu 4096 Apr 12 06:09 multiqc
drwxr-xr-x 87 ubuntu ubuntu 4096 Apr 12 05:28 mutect2
-rw-r--r--  1 ubuntu ubuntu 5565 Apr 12 06:09 project-summary.yaml
drwxr-xr-x  2 ubuntu ubuntu 4096 Apr 11 20:20 provenance
drwxr-xr-x  4 ubuntu ubuntu 4096 Apr 12 06:00 qc
drwxr-xr-x  3 ubuntu ubuntu 4096 Apr 11 20:21 regions
drwxr-xr-x  6 ubuntu ubuntu 4096 Apr 12 06:09 report
drwxr-xr-x  4 ubuntu ubuntu 4096 Apr 12 05:33 structural
drwxr-xr-x  2 ubuntu ubuntu 4096 Apr 12 06:20 tx
drwxr-xr-x 87 ubuntu ubuntu 4096 Apr 12 05:26 vardict
drwxr-xr-x 87 ubuntu ubuntu 4096 Apr 12 05:33 varscan
ubuntu@frontend001:/encrypted/project7/work/structural$ ls *

syn3-normal:
validate

syn3-tumor:
validate

My Configuration file.


[ec2-user@ip-172-31-59-88 tempa]$ cat deneme6.yaml 
#Cancer tumor/normal calling evaluation using synthetic dataset 3
# from the ICGC-TCGA DREAM challenge:
# https://www.synapse.org/#!Synapse:syn312572/wiki/62018

---
details:
- algorithm:
    aligner: bwa
    align_split_size: 5000000
    nomap_split_targets: 100
    mark_duplicates: true
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    quality_format: standard
    variantcaller: [mutect2, vardict, varscan, freebayes]
    indelcaller: scalpel
    ensemble: 
      numpass: 2
    variant_regions: s3://tuncproject/bcbiovmrun/input/NGv3.bed
    # svcaller: [cnvkit, lumpy, delly]
    # coverage_interval: amplicon
  analysis: variant2
  description: syn3-normal
  #files: ../input/synthetic.challenge.set3.normal.bam
  files:
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_normal_NGv3_1.fq.gz
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_normal_NGv3_2.fq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: normal
- algorithm:
    aligner: bwa
    align_split_size: 5000000
    nomap_split_targets: 100
    mark_duplicates: true
    recalibrate: false
    realign: false
    remove_lcr: true
    platform: illumina
    quality_format: standard
    variantcaller: [mutect2, vardict, varscan, freebayes] 
    indelcaller: scalpel
    ensemble:
      numpass: 2
    #variant_regions: s3://tuncproject/bcbiovmrun/input/NGv3.bed
    #validate_regions: s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_20pctmasked_truth.vcf.gz
    #validate_regions: s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_20pctmasked_truth_regions.bed
    # svcaller: [cnvkit, lumpy, delly]
    # coverage_interval: amplicon
  #   svvalidate:
  #     DEL: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_DEL.bed
  #     DUP: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_DUP.bed
  #     INS: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_INS.bed
  #     INV: ../input/synthetic_challenge_set3_tumor_20pctmasked_truth_sv_INV.bed
  analysis: variant2
  description: syn3-tumor
  #files: ../input/synthetic.challenge.set3.tumor.bam
  files:
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_NGv3_1.fq.gz
    - s3://tuncproject/bcbiovmrun/input/synthetic_challenge_set3_tumor_NGv3_2.fq.gz
  genome_build: GRCh37
  metadata:
    batch: syn3
    phenotype: tumor
fc_date: '2014-08-13'
fc_name: dream-syn3
resources:
  gatk:
    jar: s3://tuncproject/gatktools/GenomeAnalysisTK.jar 
  mutect: 
    jar: s3://tuncproject/gatktools/mutect-1.1.7.jar
upload:
  dir: s3://tuncproject/bcbiovmrun/final/
chapmanb commented 8 years ago

Tunc; The indel calls with scalpel or pindel are only run with MuTect, so are only available if you run that variantcaller. The idea with these was to provide a supplement for SNP-only callers like MuTect, not to provide standalone indel only calling. For your successful runs, the indels called by scalpel or pindel will be merged with the MuTect calls as one final VCF, so look in the final directory for the sample-mutect.vcf.gz file for these calls. Sorry for the confusion and hope this helps.

mortunco commented 8 years ago

Dear Brad;

I did not know this I am sorry. Just to be clear the confusion, I apologise for my ignorant question. If I use mutect2, pindel or scalpel will not work because mutect2 itself has already gotten indels?

I know that freebayes, varscan and vardict can call snp + indel. Could you guide me in a way to determine to choose one? mutect+scalpel vs mutect2?

I owe you a lot! I appreciate all of your answers which solved my problems.

Thank you very much.

Best regards,

Tunc.

chapmanb commented 8 years ago

Tunc; That's right, all of the other callers include indel calling so indecaller only supplements MuTect output. If you want to choose a single caller to use I'd recommend VarDict. It performs well on both SNPs and Indels in our latest validations:

http://bcb.io/2016/04/04/vardict-filtering/

If the choice you have is only mutect/scalpel versus mutect2, I'd pick mutect/scalpel right now. mutect2 is still a bit slow and did not do well on cross-validations in the comparison above so I'm still trying to understand the cases where it does a good job. Hope this helps.

mortunco commented 8 years ago

Dear Brad;

I am planning to use 4 callers; Mutect + scalpel , varscan, vardict, freebayes, and use ensemble option to get concordant results. Unfortunately, I will ask more questions afterwards.

Thanks,

Best,

Tunc.