brentp / slivar

genetic variant expressions, annotation, and filtering for great good.
MIT License
248 stars 23 forks source link

gnotate in v0.2.8 can fail to annotate indels >1 bp #149

Closed seboyden closed 1 year ago

seboyden commented 1 year ago

Running e.g. slivar_v0.2.8 expr --vcf $BCF --pass-only -g $GNOMAD311 --info "INFO.gnomad311_AF < 0.02" ...

I get variants in my output that turn out to have >0.02 AF in gnomAD. They are all indels. Their frequencies in INFO are set to -1, so gnotate appears to not be finding them in my make-gnotate zip file. The unannotated indels appear to all be >1 bp in length. I have seen a 15 bp DEL that was correctly annotated, and a 17 bp DEL (present in gnomAD 3.1.1) with the incorrect gnomad311_AF=-1, but I haven't thoroughly checked if this size cutoff always holds.

If I run the identical script except I revert to slivar v0.2.7, the indels are correctly annotated by gnotate.

For example, this variant

#CHROM  POS ID  REF                                         ALT
chr4    443460  .   CCACATGTGTAGGGTTTCTCTCCAGTATGAATTCTCCTATGTACATAAAGGTTTGCGGACTGTCTAAAGGCTTTGCCACATACTT   C

is present in gnomAD v3

slivar_v0.2.7 gnotate correctly produces gnomad311_AF=1.43507e-05 in INFO slivar_v0.2.8 gnotate incorrectly produces gnomad311_AF=-1 in INFO

brentp commented 1 year ago

thanks for reporting. I'll have a look. I changed the zip library and I think it must be related to those changes: https://github.com/brentp/slivar/compare/v0.2.7...v0.2.8

Are you using one of the .zip files from the releases? If so, which one. If not, can you point me to the location or shared how it was created?

I'll get back to you by next week.

seboyden commented 1 year ago

For gnomAD v2 LiftOver to GRCh38, I'm using the gnomad.hg38.v2.zip that you provide, which I downloaded in 2019 and haven't changed since. For gnomAD v3.1.1 (in my example above), I'm using my own gnomad311.zip, made in 2021 with slivar_v0.2.1 as follows:

slivar_v0.2.1 make-gnotate \
--prefix gnomad311 \
--field AF:gnomad311_AF \
--field AF_afr:gnomad311_AF_afr \
--field AF_ami:gnomad311_AF_ami \
--field AF_amr:gnomad311_AF_amr \
--field AF_asj:gnomad311_AF_asj \
--field AF_eas:gnomad311_AF_eas \
--field AF_fin:gnomad311_AF_fin \
--field AF_mid:gnomad311_AF_mid \
--field AF_nfe:gnomad311_AF_nfe \
--field AF_oth:gnomad311_AF_oth \
--field AF_sas:gnomad311_AF_sas \
--field AF_controls_and_biobanks:gnomad311_AF_controls \
--field AF_non_cancer:gnomad311_AF_non_cancer \
--field AF_non_neuro:gnomad311_AF_non_neuro \
--field AF_non_topmed:gnomad311_AF_non_topmed \
--field AF_non_v2:gnomad311_AF_non_v2 \
--field nhomalt:gnomad311_nhomalt \
--field nhomalt_controls_and_biobanks:gnomad311_nhomalt_controls \
--field AC_XY:gnomad311_AC_XY \
--field AC_controls_and_biobanks_XY:gnomad311_AC_controls_XY \
gnomad.genomes.v3.1.1.sites.vcf.gz

I'll email you the location of that file. I'm using the same zip files for either slivar_v0.2.7 or slivar_v0.2.8. Let me know if I need to remake my zip files using current make-gnotate.

brentp commented 1 year ago

This is fixed in v0.2.9 thanks to test-case from @seboyden