Slivar tsv functionality w/o family structure

brentp / slivar

genetic variant expressions, annotation, and filtering for great good.

MIT License

252 stars 23 forks source link

Slivar tsv functionality w/o family structure #127

Open nbalanda23 opened 2 years ago

nbalanda23 commented 2 years ago

Hey Brent,

Long time gemini user, and setting up slivar for my group. We focus largely on singletons and used gemini filters like num_het, num_hom_alt to capture variants that are present in multiple singletons in a cohort. I'm looking for similar functionality with slivar. I see that I can use groups name all samples "singletons" and then singletons.alts=1 and singletons.AB > 0.2, etc to perform some filtration. After doing so, I'd love to be able to make a tsv containing variants that passed filtration along with some additional annotations for manipulation by clinicians in excel. Wondering what it would take to make this possible? I could do something similar in bash/python, but if it's possible with a small modification to the tsv script I'd love to figure that out, as I like the HPO, pLI, clinvar function of the tsv script too. Thanks for all your work.

brentp commented 2 years ago

Hi Nick, you can use, for example variant.num_het, https://github.com/brentp/slivar#attributes. But that is not available in the tsv output. You can also simply use --sample-expr "singleton:variant.num_het == 1 && sample.het && sample.GQ > 20". Currently, the slivar tsv is very much tuned for trios and works a bit for duos, but not for singletons. That said, I think it might work OK for singletons with small changes so I'm happy to add that functionality. Let me know what's working and what's needed and we can proceed from there.

nbalanda23 commented 2 years ago

Hey Brent,

Want to shoot a follow up on this. Wondering if it would be possible to add the tsv funcitonality for singletons? Either that, or maybe you have a suggestion of another way to accomplish our goals. I'm currently working with cohort vcfs containing all singletons, both germline and somatic (via GATK). I'm trying to query/filter these. I've added cadd to the vcf and annotated with VEP. Previously added gnomad, but with slivar I'm using the gnotate functionality. However, my goal is to query using both annotations (e.g. cadd >20), pLI (which I don't currently have in vcf), and cohort data (num het, etc). Ultimately hoping to take filtered variant sets and convert to a more friendly tsv for clinicians to access. Was thinking of combining slivar filtering functions and combining w/ tsv function (to get the nice output, and for pLI). Do you have any suggestions? Thanks so much! Nick

brentp commented 2 years ago

Hi Nick, currently, the best way would be to use slivar for filtering and then use custom script or another tool to convert to CSV. It would be quite simple to build slivar tsv for singletons (if trios werent supported). I will look into this in the coming months.

nbalanda23 commented 2 years ago

Sounds good. Thanks for the input!