brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
251 stars 23 forks source link

build command #96

Closed JakeHagen closed 3 years ago

JakeHagen commented 3 years ago

I wanted to test gnotate not populating the info field with a missing value (-1). It looks decently straight forward, but since I don't really know nim, can you tell me the command you use to build the static binary?

Thanks

brentp commented 3 years ago

Can you check here and let me know what you'd like me to clarify? Also see there's a section for singularity if you want to build somewhere that docker is not allowed.

I am also curious why you don't want a missing value.

JakeHagen commented 3 years ago

Thank you for that. I will try to use that document when I get back from lunch.

You asking why, made me stop and consider my motivations. I think the only real answer I have is space saving. I had thought the -1 would disallow some types of filtering but I guess you can do something like field < 0.5 && field != -1 . And it actually makes it much more convenient for more common filtering like gnomAD_AF < 0.01

I was considering using gnotate for all my numeric annotation fields, which I am unsure is a good idea yet but makes space saving a little more important. I still need to test the performance when using many fields not just a couple. slivar seems to be much faster than vcfanno when comparing on 4 gnomad3 fields (vcfanno used the gnomad3 vcf file reduced to the same 4 fields), but will this performance advantage hold when using a lot more fields?

brentp commented 3 years ago

Ah, I see. Yes, the performance will hold but it will use more memory. It uses 32 bits per variant for each field and pulls a whole chromosome into memory. Usually, that's small enough that it's not a problem, but as you add more and more fields (or user denser and denser data), it starts to make a difference.

It won't save space to not use -1. That's missing value is stored in the gnotate file exactly once so that slivar knows what to put when it doesn't find the variant.

JakeHagen commented 3 years ago

Ah interesting, good to know. I have access to more memory than I have time, so Ill take it.

I meant in the final VCF, not the gnotate file itself.

brentp commented 3 years ago

If you have your vcf (b)gzipped or as BCF, the entire file size will be barely different with/without the -1 for missing value.

JakeHagen commented 3 years ago

Gotcha. Well thank you for all the help. I will close this now, but I might comment later if I have trouble building. I might still try to mess around with the slivar code to learn nim.

JakeHagen commented 3 years ago

I was able to statically build slivar, you made it very easy with that binary.

I am also considering not annotating missing again because it allows splitting gnotate zips. Very large gnotates have problems (https://github.com/brentp/slivar/issues/86) . So by splitting them it seems to avoid this, but if it annotates missing, it will overwrite real previous annotations.

brentp commented 3 years ago

glad to hear you got it building.