Illumina / Nirvana

The nimble & robust variant annotator
https://illumina.github.io/NirvanaDocumentation/
GNU General Public License v3.0
170 stars 44 forks source link

Incorrect transcript hgvsc annotations #50

Closed michaelsykes closed 3 years ago

michaelsykes commented 3 years ago

Latest Nirvana versions are generating incorrect hgvsc annotations for a large subset of variants (but not all variants). The error is present in both 3.14.0 and 3.13.0 (.zip binary download, not compiled from source) but does not appear in 3.11.1 (I did not test all versions released in between).

Tests were performed using the sample dataset HiSeq.10000.vcf.gz, and results were compared to the provided annotation file HiSeq.10000.json.gz

For example consider the following variant from the VCF file (record truncated for clarity).

chr1 30569 rs62101646 G A

hgvsc from downloaded JSON: ENST00000473358.1:n.492G>A hgvsc from 3.11: ENST00000473358.1:n.492G>A hgvsc from 3.14: ENST00000473358.1:n.492T>A

A second type of error results in an "=" annotation, for example this variant from the VCF:

chr1 5150948 rs11583754 A G

hgvsc from downloaded JSON: ENST00000443270.1:n.12T>C hgvsc from 3.11: ENST00000443270.1:n.12T>C hgvsc from 3.14: ENST00000443270.1:n.12=

michaelsykes commented 3 years ago

Forgot to mention that both the 3.11.1 and 3.14.0 annotations were performed using the same set of reference files and databases, so the issue appears to be with Nirvana itself, and not the associated databases.

MichaelStromberg commented 3 years ago

Thanks Mike, I'll have the team take a look at this tomorrow.

rajatshuvro commented 3 years ago

Hello Mike, I was investigating the differences and came up with a strange conclusion. I checked the following variants and HGVS c. notations

Variant 3.14 HGVS c. 3.11 HGVS c.
1-30569-G-A ENST00000473358.1:n.492T>A ENST00000473358.1:n.492G>A
1-52640-A-T ENST00000606857.1:n.168A>T ENST00000606857.1:n.168G>T
1-63704-C-T ENST00000492842.1:n.757= ENST00000492842.1:n.757C>T

I checked each of these genomic position on the UCSC genome browser and found that in each case, the RefAllele was wrong in the VCF. Nirvana 3.11 did not check the variant ref allele. But 3.14 does and that is why they have been corrected and the HGVS c. notations modified accordingly. So, we believe the changes you see in 3.14 are correct.

We realize that the VCF provided is not a good example. We will soon replace it with a validated VCF.

Thanks Rajat