I noticed that at least in one instance, the canonical transcript annotation for Ensembl is incorrect in v3.19.
Nirvana marks an Ensembl transcript as canonical that does not correspond to the RefSeq transcript. Ensembl itself lists another transcript as canonical, which does match the RefSeq canonical.
This used to not be an issue in the past, and looks like it's fixed in 3.20. I need you to investigate the size of this issue (how many other genes?) so that we can decide if and how we need to notify customers to re-analyze their samples. Note that this gene is part of TSO500, so there might be several products affected. In ICA Cohorts, we're looking to re-analyze some 10K WGS samples, depending on the extent of this issue.
Env: US Prod Lambda Service.
JSON header:
{"header":{"annotator":"Nirvana 3.19.0","creationTime":"2023-05-03 22:54:06","genomeAssembly":"GRCh38","schemaVersion":6,"dataVersion":"91.27.67","dataSources":[{"name":"VEP","version":"91","description":"BothRefSeqAndEnsembl","releaseDate":"2017-12-18"}
ENST00000269305 and NM_000546 are the canonical transcripts acc/ Ensembl. They used to be, too, in previous outputs of Nirvana. I am not sure since when this issue exists in Nirvana. I have some older Nirvana results from last summer and they are not affected by this issue -- ENST00000269305 is marked as canonical. (These are in a DB w/o Nirvana header/version info though.)
I noticed that at least in one instance, the canonical transcript annotation for Ensembl is incorrect in v3.19.
Nirvana marks an Ensembl transcript as canonical that does not correspond to the RefSeq transcript. Ensembl itself lists another transcript as canonical, which does match the RefSeq canonical.
Gene: TP53, NCBI 7157 Canonical RefSeq transcript: NM_000546 Canonical Ensembl: ENST00000269305 Nirvana-reported canonical Ensembl: ENST00000610292
Source: http://useast.ensembl.org/Homo_sapiens/Gene/Summary?db=core;g=ENSG00000141510;r=17:7661779-7687538
This used to not be an issue in the past, and looks like it's fixed in 3.20. I need you to investigate the size of this issue (how many other genes?) so that we can decide if and how we need to notify customers to re-analyze their samples. Note that this gene is part of TSO500, so there might be several products affected. In ICA Cohorts, we're looking to re-analyze some 10K WGS samples, depending on the extent of this issue.
Env: US Prod Lambda Service. JSON header:
{"header":{"annotator":"Nirvana 3.19.0","creationTime":"2023-05-03 22:54:06","genomeAssembly":"GRCh38","schemaVersion":6,"dataVersion":"91.27.67","dataSources":[{"name":"VEP","version":"91","description":"BothRefSeqAndEnsembl","releaseDate":"2017-12-18"}
ENST00000269305 and NM_000546 are the canonical transcripts acc/ Ensembl. They used to be, too, in previous outputs of Nirvana. I am not sure since when this issue exists in Nirvana. I have some older Nirvana results from last summer and they are not affected by this issue -- ENST00000269305 is marked as canonical. (These are in a DB w/o Nirvana header/version info though.)
Input: DRAGEN re'sequed 1000 Genomes sample HG00097, GRCh38
gzip -c -d DRAGEN-1KGP-3202-HG00097.hard-filtered.vcf.gz | grep 17 | grep 7676154
chr17 7676154 . G C 50.00 PASS AC=1;AF=0.500;AN=2;DP=32;FS=0.000;MQ=250.00;MQRankSum=4.719;QD=1.56;ReadPosRankSum=2.878;SOR=0.681;FractionInformativeReads=1.000;R2_5P_bias=-2.088 GT:AD:AF:DP:F1R2:F2R1:GQ:PL:GP:PRI:SB:MB 0/1:13,19:0.594:32:7,7:6,12:47:85,0,48:5.0000e+01,8.1800e-05,5.0542e+01:0.00,34.77,37.77:6,7,10,9:6,7,12,7
JSON output (trimmed):
{**"transcript":"ENST00000610292.4"**,"source":"Ensembl","bioType":"protein_coding","codons":"cCc/cGc","aminoAcids":"P/R","cdnaPos":"465","cdsPos":"98","exons":"3/10","proteinPos":"33","geneId":"ENSG00000141510","hgnc":"TP53","consequence":["missense_variant"],"hgvsc":"ENST00000610292.4:c.98C>G","hgvsp":"ENSP00000478219.1:p.(Pro33Arg)",**"isCanonical":true**,"polyPhenScore":0.045,"polyPhenPrediction":"benign","proteinId":"ENSP00000478219.1","siftScore":0.26,"siftPrediction":"tolerated"},
{**"transcript":"ENST00000269305.8"**,"source":"Ensembl","bioType":"protein_coding","codons":"cCc/cGc","aminoAcids":"P/R","cdnaPos":"405","cdsPos":"215","exons":"4/11","proteinPos":"72","geneId":"ENSG00000141510","hgnc":"TP53","consequence":["missense_variant"],"hgvsc":"ENST00000269305.8:c.215C>G","hgvsp":"ENSP00000269305.4:p.(Pro72Arg)","polyPhenScore":0.045,"polyPhenPrediction":"benign","proteinId":"ENSP00000269305.4","siftScore":0.57,"siftPrediction":"tolerated"},
{**"transcript":"NM_000546.5"**,"source":"RefSeq","bioType":"protein_coding","codons":"cCc/cGc","aminoAcids":"P/R","cdnaPos":"417","cdsPos":"215","exons":"4/11","proteinPos":"72","geneId":"7157","hgnc":"TP53","consequence":["missense_variant"],"hgvsc":"NM_000546.5:c.215C>G","hgvsp":"NP_000537.3:p.(Pro72Arg)"**,"isCanonical":true**,"polyPhenScore":0.045,"polyPhenPrediction":"benign","proteinId":"NP_000537.3","siftScore":0.57,"siftPrediction":"tolerated"},