Open serge2016 opened 8 years ago
Could you provide a detailed instruction of all steps needed to run fpfilter (VarScan false-positive filter)?
According to Dan's post1 and my own thoughts, I have to run next steps:
samtools mpileup -f ref.fa -B -o tumor.mpileup tumor.bam
samtools mpileup -f ref.fa -B -o normal.mpileup normal.bam
java -jar VarScan.v2.4.2.jar somatic normal.mpileup tumor.mpileup --output-snp snp.vcf --output-indel indel.vcf --min-coverage 3 --min-var-freq 0.08 --p-value 0.10 --somatic-p-value 0.05 --strand-filter 0 --output-vcf 1
java -jar VarScan.v2.4.2.jar processSomatic all.vcf --min-tumor-freq 0.08
awk 'BEGIN {OFS="\t"} {if (!/^#/) { isDel=(length($4) > length($5)) ? 1 : 0; print $1,($2+isDel),($2+isDel); }}' all.Somatic.hc.vcf > all.Somatic.hc.var
and so on.
The Dan's phrase "And if you want multiple positions reported, use multiple lines:" still remains unclear for me. Do I have to gerenate several lines for each insertion and deletion of more than 1 bp?bam-readcount -q 1 -b 20 -l all.Somatic.hc.var -f ref.fa tumor.bam > all.Somatic.hc.readcount
bam-readcount -q 1 -b 20 -l all.Germline.hc.var -f ref.fa normal.bam > all.Germline.hc.readcount
bam-readcount -q 1 -b 20 -l all.LOH.hc.var -f ref.fa normal.bam > all.LOH.hc.readcount
-b 15
may be found here.java -jar VarScan.v2.4.2.jar fpfilter all.Somatic.hc.vcf all.Somatic.hc.readcount --output-file all.Somatic.hc.filtered.vcf --dream3-settings
and so on.
Flags --min-ref-avgrl NN --min-var-avgrl NN
are necessary if (trimmed) read length is less than default 90?Other mentions of var file creation:
perl -ane 'print join("\t",@F[0,1,1])."\n" unless(m/^#/)' snvs.vcf > snvs.bed
, meaning that we use VCF -> BED from site3 for SNVs & InDels:
awk 'BEGIN {OFS="\t"} {if (!/^#/) { InsDelLen=(length($4) > length($5)) ? length($4) : length($5); print $1,($2-1),($2-1+InsDelLen); }}' all.vcf > all.bed
,
where we use
Any news here? Is the workflow suggested by @serge2016 the correct/recommended one?
I didn't think that processSomatic and other subsequent commands would work on VCF's - the original documentation says to use VarScan's file format. Anyone else have updates? I'll be giving this workflow a shot.
@Neato-Nick, I use exactly this workflow. And VarScan works with VCF file format. Do you have any evidence that it doesn't?
Oh, awesome! It's just not mentioned in the documentation that subsequent scripts after the initial calling from Pileups would take VCF input. Good to know you've been using this flow for a while!
On Aug 14, 2017 9:49 PM, "Sergey Mitrfanov" notifications@github.com wrote:
@Neato-Nick https://github.com/neato-nick, I use exactly this workflow. And VarScan works with VCF file format. Do you have any evidence that it doesn't?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dkoboldt/varscan/issues/10#issuecomment-322376653, or mute the thread https://github.com/notifications/unsubscribe-auth/AFlB0j7kk95FfZ5RLL8WR3frmX8LtWuYks5sYSNWgaJpZM4JSFIb .
Is the workflow suggested by @serge2016 correct? Should I used this workflow for my analysis or any new correction need to be done? Kindly give answer to the issue raised by @serge2016 regarding indel positions in point no. 5.
Which one is correct for indels?
Here is one more guide, but right now it describes rather old versions: https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/#varscan
The key difference is in the very first step, samtools mpileup
. GDC (TCGA) suggests to process both normal and tumor BAMs together and this provides different results.
Second thing, that I have figured out, is that VarScan's 'defaults' are different in the software description/help and in it's code. So we should specify values far all (even default) parameters.
Third thing is standalone FPFilter tool: https://github.com/ucscCancer/fpfilter-tool. It may be useful, but, as I understood and checked, it does the same as my point 5, 6 and partly 7.
@serge2016 Thanks for prompt reply. Firstly, I ran samtools mpileup using Tumor and Normal together as suggested by you. Second, I need to check for the defaults parameters and parameters in script to better understanding.
Nice post, the tool is lack of documents.
Hello, I have basic question in varscan2 mutation calling. varscan2 divide total SNPs into germline, somatic, LOH and unknown based on allele frequency. My question is if all 3 genomes (i.e. hg19, blood and tumor) have different nucleotide (at same position), where do that SNP should be classify?
For e.g. 10th position of Chromosome 1 have "A" in hg19, Blood sample have "G" and tumor have "T". Coverage is adequate in both blood and tumor sample. Where do I classify this type of mutation?
Hello! What for do we have to use bam-readcount? If I am right, varscan already gets this info from input BAMs and stores it in it's VCF!