brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
247 stars 23 forks source link

Using SLIVAR for filtering variants #126

Closed dr-ashu-geno closed 2 years ago

dr-ashu-geno commented 2 years ago

Hello.

Thank you for developing slivar.

I used smoove to call SVs in 11,600 samples with the option -d. After genotyping all samples for the union of SVs detected by smoove, I used smoove --annotate to annotate the variants. The total number of SVs in all my samples was 356,428. Then, I used slivar to filter variants based on duphold information; however:

1) in slivar log file, I got these warnings: [slivar tsv] warning! didn't find ANN in header in $annotate_file trying other fields [slivar tsv] warning! didn't find CSQ in header in $annotate_file trying other fields [slivar tsv] warning! didn't find BCSQ in header in $annotate_file trying other fields

do these warnings mean that I have mistakenly skipped or forgot any steps during SV calling/genotyping/annotating before?

2) After running slivar, there are no variants excluded. And, surprisingly, the slivar output size is larger than annotation output size! (I assume slivar is not supposed to add any information/characters to its input file, is this correct?)

the command I used for slivar is written below; do I need to add any other options to this command?

$SLIVAR expr \ --info "variant.call_rate > 0.5 && ((INFO.SVTYPE == 'DEL') || (INFO.SVTYPE == 'DUP'))" \ --sample-expr \ "LQHET:sample.alts == 1 && (((sample.DHFFC > 0.75) && (INFO.SVTYPE == 'DEL')) || ((sample.DHFFC < 1.25) && INFO.SVTYPE == 'DUP'))" \ --sample-expr \ "HQHET:sample.alts == 1 && (((sample.DHFFC < 0.75) && (INFO.SVTYPE == 'DEL')) || ((sample.DHFFC > 1.25) && INFO.SVTYPE == 'DUP'))" \ --sample-expr \ "HQHA:sample.alts == 2 && (((sample.DHFFC < 0.5) && (INFO.SVTYPE == 'DEL')) || ((sample.DHFFC > 1.5) && INFO.SVTYPE == 'DUP'))" \ -o $OUTDIR/All-samples-SVs-by-smoove-filtered-by-slivar.vcf \ --vcf $annotate_file

I also have a question about using MSHQ for filtering variants. In smoove GitHub page, it is written that "As a first pass, users can look for variants with MSHQ > 3". After running slivar, I filtered SVs with MSHQ <4. The number of remaining SVs was 181,996 only !!! Is the filtering I used, correct??

Thank you in advance, Best regards,

brentp commented 2 years ago

Hello, those are just warnings. I should remove them, but they just indicate that they can't find the information about CSQ. For your other question, by default slivar only annotates variants, if you want it to output only variants that pass your expressions, you can add --pass-only. I would suggest to use only the duphold annotations and not MSHQ.