dantaki / SV2

Support Vector Structural Variation Genotyper
58 stars 11 forks source link

False positive findings of de novo calling #31

Open WeiCSong opened 4 years ago

WeiCSong commented 4 years ago

Dear Dan Thank you for your excellent tool. I'm analyzing WGS data for a trio and wish to get the de novo SVs. I ran sv2 separately for each person like:

sv2 -i WOC5_3.final.bam -b cnmops.bed delly.bed manta.bed lumpy.bed -snv WOC5_3.sentieon.snp.vcf.gz -p OC5_3.ped -o WOC5_3 -merge -M"

and got genotype data in .vcf files (WOC5_3 is the ID for child). I got ~500 SV for each person, and i wrote a simple script to extract those SVs that appeared only in child. However, this gave me ~250 de novo SVs, which were apparantly wrong because de novo SVs are extremely rare (~0.1% of total SVs). I guess i misunderstood the genotype matrix and i wish to learn from you about the right procedure.

a related question is about the relation between false positive de novo calling and filtration steps. In my understanding, if you apply strict filtration on individual level, you'll get more false positive de novo calling. For example, if mother gives her child one SV, and our filtration is too strict so that we do not find this SV on mother, we will recognize this SV on child as de novo, and this is a false positive finding. So i'm feeling puzzle about the strict "DENOVO_FILTER" option in SV2. Could you help me to understand the filtration steps? Thanks in advance for your help!

Best Regards Weichen Song