Closed JakeHagen closed 3 years ago
Can you check here and let me know what you'd like me to clarify? Also see there's a section for singularity if you want to build somewhere that docker is not allowed.
I am also curious why you don't want a missing value.
Thank you for that. I will try to use that document when I get back from lunch.
You asking why, made me stop and consider my motivations. I think the only real answer I have is space saving. I had thought the -1 would disallow some types of filtering but I guess you can do something like field < 0.5 && field != -1
. And it actually makes it much more convenient for more common filtering like gnomAD_AF < 0.01
I was considering using gnotate for all my numeric annotation fields, which I am unsure is a good idea yet but makes space saving a little more important. I still need to test the performance when using many fields not just a couple. slivar seems to be much faster than vcfanno when comparing on 4 gnomad3 fields (vcfanno used the gnomad3 vcf file reduced to the same 4 fields), but will this performance advantage hold when using a lot more fields?
Ah, I see. Yes, the performance will hold but it will use more memory. It uses 32 bits per variant for each field and pulls a whole chromosome into memory. Usually, that's small enough that it's not a problem, but as you add more and more fields (or user denser and denser data), it starts to make a difference.
It won't save space to not use -1. That's missing value is stored in the gnotate file exactly once so that slivar knows what to put when it doesn't find the variant.
Ah interesting, good to know. I have access to more memory than I have time, so Ill take it.
I meant in the final VCF, not the gnotate file itself.
If you have your vcf (b)gzipped or as BCF, the entire file size will be barely different with/without the -1 for missing value.
Gotcha. Well thank you for all the help. I will close this now, but I might comment later if I have trouble building. I might still try to mess around with the slivar code to learn nim.
I was able to statically build slivar, you made it very easy with that binary.
I am also considering not annotating missing again because it allows splitting gnotate zips. Very large gnotates have problems (https://github.com/brentp/slivar/issues/86) . So by splitting them it seems to avoid this, but if it annotates missing, it will overwrite real previous annotations.
glad to hear you got it building.
I wanted to test gnotate not populating the info field with a missing value (-1). It looks decently straight forward, but since I don't really know nim, can you tell me the command you use to build the static binary?
Thanks