broadinstitute / gnomad-browser

Explore gnomAD datasets on the web
https://gnomad.broadinstitute.org
MIT License
81 stars 41 forks source link

Wrong HGVS consequence for clinvar reported variants in gnomAD v4 #1453

Open thedrakesng opened 6 months ago

thedrakesng commented 6 months ago

What you did:

I was checking a variant in Clinvar Variants section of gnomAD v4. HGVS consequence shown in ClinVar variants section of gnomAD v4 is different from that of clinvar and gnomAD v2.1 for example, consequence of ClinVar Variation ID914550, is displayed as p.Gln334Ter in gnomAD v4 (https://gnomad.broadinstitute.org/gene/ENSG00000156466?dataset=gnomad_r4), however it is shown as NM_001001557.4(GDF6):c.1251C>T (p.Pro417=) in clinvar(https://www.ncbi.nlm.nih.gov/clinvar/variation/914550/) and gnomAD v2.1(https://gnomad.broadinstitute.org/gene/ENSG00000156466?dataset=gnomad_r2_1) The same problem was discovered in other types of variants in different genes. ![image](https://github.com/broadinstitute/gnomad-browser/assets/77314367/537cc4ef-fc80-4742-a8b8-ca99a340284c) ### What you expected to see after you did that:

image

What you actually saw after you did that:

-gnomAD v4.0, GDF6, 8-96144680-G-A, ClinVar Variation ID914550, HGVS consequence: p.Gln334Ter, VEPannoation:stop gained https://gnomad.broadinstitute.org/gene/ENSG00000156466?dataset=gnomad_r4

image


As I was writing this bug report, I found few more variants with different HGVS consequence from that of clinvar regardless of gnomAD version.

for example, gnomADversion: gnomAD v2.1.1, Gene:P4HB, variant:17-79817175-C-G , ClinVarVariationID:1507457, HGVSconsequence: c.233+1G>C, VEPannotation:splicedonor image

gnomADversion: gnomAD v4.0, Gene:P4HB, variant:17-81859299-C-G, ClinVarVariationID:1507457, HGVSconsequence: c.233+1G>C, VEPannoation:splicedonor image

is reported in clinvar as NM_000918.4(P4HB):c.234G>C (p.Arg78Ser).

rileyhgrant commented 5 months ago

Hiya @thedrakesng

Thanks for bringing this to our attention. I did some digging into this this afternoon, and it appears to be a discrepancy between which of the transcript consequences are displayed.

I'll ask our product owner about what we want to display here, as I don't think I'm in a position to unilaterally make this decision. The discrepancy is that we're showing the most severe consequence of all the possible transcript consequences, which is the stop gained p.Gln334Ter you see, instead of the synonymous p.Pro417Pro, which is the consequence for the mane select transcript, and my hunch on what we actually want to display.


Information from my digging session for future reference

For v4, when we annotate the transcript consequences with vep105, we get four different consequences, including the synonymous one reported in v2 (which uses a different version of vep) and on ClinVar's website.

Here's a few portions of the table I for the transcript consequences

+---------------------------------------+---------------------------------------------+-----------------------------------------+
| transcript_consequences.transcript_id | transcript_consequences.polyphen_prediction | transcript_consequences.sift_prediction |
+---------------------------------------+---------------------------------------------+-----------------------------------------+
| str                                   | str                                         | str                                     |
+---------------------------------------+---------------------------------------------+-----------------------------------------+
| "ENST00000621429"                     | NA                                          | NA                                      |
| "ENST00000287020"                     | NA                                          | NA                                      |
| "NM_001001557.4"                      | NA                                          | NA                                      |
| "ENST00000620978"                     | NA                                          | NA                                      |
+---------------------------------------+---------------------------------------------+-----------------------------------------+
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+
| transcript_consequences.gene_symbol | transcript_consequences.hgvsc | transcript_consequences.hgvsp | transcript_consequences.is_canonical |
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+
| str                                 | str                           | str                           |                                 bool |
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+
| "GDF6"                              | "c.1000C>T"                   | "p.Gln334Ter"                 |                                   NA |
| "GDF6"                              | "c.1251C>T"                   | "p.Pro417Pro"                 |                                 True |
| "GDF6"                              | "c.1251C>T"                   | "p.Pro417Pro"                 |                                 True |
| "GDF6"                              | "c.794-3C>T"                  | NA                            |                                   NA |
+-------------------------------------+-------------------------------+-------------------------------+--------------------------------------+

In our logic we sort this list such that the most severe transcript comes before the mane select transcript. The result is that on a page such as the gene page above, even though it's referencing the mane select transcript, we display the transcript consequence for what is possibly a different transcript, here this results in a discrepancy between v4's clinvar plot (shows a stop gained for this variant) and v2's clinvar plot and clinvar's website (shows a synonymous variant).

If we want to prioritize the mane select consequence, it should be as straightforward as modifying lines 130 - 138 in annotate_transcript_consequences.py. However, as this function is also referenced in our variant pipelines, it might better serve to add another parameter you pass to determine the sort order of the transcripts.

We could also modify transcriptConsequence.ts to determine which consequence it pulls in the context of a gene. Currently it just takes the first one, which for clinvar is the most severe consequence.

thedrakesng commented 5 months ago

Thank you @rileyhgrant , for such thorough explanation.

rileyhgrant commented 4 months ago

From some discussion -- there's a seperate issue here in that we don't actually even display some of these transcripts that VEP is giving us consequences for