ClinGen / gene-and-variant-curation-tools

ClinGen's gene and variant curation interfaces (GCI & VCI). Developed by Stanford ClinGen team.
https://curation.clinicalgenome.org/
MIT License
3 stars 1 forks source link

ClinVar Primary Transcript fields missing in VCI #368

Open cgpreston opened 6 months ago

cgpreston commented 6 months ago

Reported by a curator for ClinVar Variation ID: 98860.

The basic info tab is missing the "ClinVar Primary Transcript" fields in the VCI (see screenshot)

Screen Shot 2024-05-10 at 2 33 42 PM

After developer assessment

the cause is a data mismatch in the XML from ClinVar’s E-utilities: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=clinvar&rettype=vcv&is_variationid&from_esearch=true&old_xml=T&id=98860

We look for a sequenceAccessionVersion attribute of a NucleotideExpression element (as a child of an HGVS element with attribute Type="coding") with a value that can be found in the VariationName attribute of the VariationArchive element

The one sequenceAccessionVersion attribute matching that description (with a value of NM_000329.3):

image

doesn’t match the VariationName attribute (with a value of NM_000329.2(RPE65):c.292_311del (p.Ile98Hisfs)):

image

It appears a code check is catching the transcript differences, e.g: NM_000329[.3 vs .2]. We should discuss if this check (to the transcript version level) is useful/necessary or if the check should be on the transcript (but not the version).

Recommendation of next steps:

  1. Assess frequency of this type of error by seeing if we get more reports of this type of bug. IF yes to 1: Discuss validity of modifying this check at the level of a transcript version, and instead checking at the level of a transcript (not the version).