Closed prasundutta87 closed 1 year ago
SVAFotate attempts to match SVs based on SVTYPE. If the SV has SVTYPE "CNV" it will attempt to match it with other SVs with SVTYPE "CNV" in the BED file. If there are no SVs with SVTYPE "CNV" on the BED file then it will not find any matches.
If the CNVs are being recorded as DEL or DUP in the VCF (which I think is the case with sniffles and possibly cuteSV too) then I don't think you should have any problem as all of the sources in the supplied BED file use these terms. The only source in the BED file that uses MCNV is the gnomdSV SVs.
There is also the -a mis parameter that you could include which will make note of overlaps between SVs with different SVTYPEs. For example, with this parameter you could create an annotation between a DEL and a MCNV and it would be recorded as a "mismatch". This may be helpful if you are worried about different SVTYPE designations for the same type of event between your VCF and whatever you use as an input BED file.
Hi @fakedrtom,
Thank you so much for the quick response. I think that makes sense. I am anyway not interested in MCNVs as they will anyway be common across populations (discussion with authors on gnomad-SV), but I guess having -a mis parameter can be helpful. I could also just turn the AF values to 1 for MCNVs. I am using the bed file provided in github having CCDG, gnomad-SV and 1000 genomes databases. Are there any pitfalls I need to consider if I am using all three databases?
Regards, Prasun
My apologies, I failed to see your question until now. I routinely use all three of those databases without any trouble. In fact, I would recommend using all three. If you do have any trouble or questions, please let me know.
Hi @fakedrtom ,
I had a query on the CNV and MCNV annotations. In long reads SV analysis (usually usinng cuteSV or sniffles), CNVs are reported as deletions and duplications. Is that taken care in scafotate somehow?
Regards, Prasun