Closed tanglingfung closed 10 years ago
Paul; Right now everything is implemented with tumor/normal or tumor/normal panel support. We could add in tumor only approaches as well, although I don't know of any good best practices to follow.
What caller do you want to use? Are you looking for freebayes --pooled-discrete
calling and --min-alternative-fraction
support without a matching normal sample?
Thanks Brad. I don't know the best approach for tumor only as well, but I am indeed trying freebayes with the options you mentioned (varscan is not working until samtools 0.2 is released, I guess).
and then I also notice other issue, e.g., in snpeff annotation, cancer option is specified with paired samples (https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/effects.py#L37).
even though these issues can be specified with the args option that you suggested, I wonder if there is a set of parameters/workflow people will usually do (e.g. COSMIC annotation / filtering of common snps). I will be happy to put things together for the community.
Just a quick note, mutect
can be used in tumor-only mode, i.e. without a normal sample and without the panel of normals. Just specify phenotype: tumor
thanks @mjafin , yes, mutect probably has the best support here, and I should try it as well.. (and indeed, I think it's a good idea to have a panel of normals) but I guess the snpeff annotation wouldn't work unless you have a tumor-normal pair
@mjafin , so, do you think mutect does better than freebayes in tumor-only calling as well?
@tanglingfung I only have anecdotes to offer; haven't done any real comparisons between FreeBayes and MuTect in tumour variant calling.
If you have tumour only samples and want to use FreeBayes, you'll have to run them in bcbio as you would run germline samples but setting min_allele_fraction
to a value of your liking (default is 20, i.e. no variants below 20% AF will get reported).
FreeBayes has the advantage of reporting InDels - MuTect is for SNPs only, but FreeBayes doesn't currently support a panel of normals filtering implicitly like MuTect does.
Thanks @mjafin , any sharing is good to me.
I was guessing the filtering with a panel of normals wouldn't be hard to implement, do you think I can take dbsnp_v138 (with >10% in 1kG)- cosmic -clinvar (e.g) for filtering?
I think you can download the Broad panel of normals (SNP) vcf file from the MuTect download page somewhere
thanks for the pointer. I think the last thing I need to handle is to add 'somatic event' to the filtered result to make the GEMINI/snpeff happy. do you think it's easier to use NA12878 as a normal pair, and filter the result with the Broad panel of normals vcf file?
Can't offer any advice here but if you end up using NA12878 (or something else) as a normal pair, let us know how it goes and what you learn from it. Are you using the public exome sample and the same capture kit?
I am thinking to use a whole genome dataset instead to avoid selection bias..
@chapmanb , my primary aim is to get a vcf with somatic flag added properly, please let me know if there is better way to go around it
@chapmanb , is your background_vcf the same as the Broad's normal panel of normal? https://github.com/chapmanb/cloudbiolinux/blob/master/cloudbio/biodata/dbsnp.py#L112
Paul; That file isn't related to cancer calling and I wouldn't suggest using it. It's an unfinished experiment on using a population background as input to variant callers.
I'm not sure what exactly you want with the somatic flag and snpEff but I'm not sure it's possible without a matched pair, at least with FreeBayes. For the SOMATIC flag, FreeBayes adds this by looking at differences between the tumor and normal, and then Luca's code converts these annotations into the flag:
https://github.com/chapmanb/bcbio-nextgen/blob/master/bcbio/variation/freebayes.py#L135
For snpEff, the -cancer
flag is calculating differences using tumor/normal calls:
http://snpeff.sourceforge.net/SnpEff_manual.html#cancer
MuTecT may be your best bet using Miika's suggestions. Let us know if you're running into errors or problems downstream.
thanks Brad, let me try different things and see.
is there a best practice for tumor only analysis? it appears to me that i can only select a tumor-normal paired analysis or the tumor sample will be treated as a normal diploid genome