Ensembl / ensembl-rest

Language agnostic RESTful data access to Ensembl data over HTTP
Apache License 2.0
138 stars 62 forks source link

ClinVar - Variation ID order #623

Closed ibosdet closed 1 month ago

ibosdet commented 7 months ago

For a specific variant, the ClinVar field in the REST response provides multiple variant IDs corresponding to all variants located at the same position. Is there a logical order to the Variation IDs that are returned that I can use to select the correct one? I'm attempting to retrieve the specific ClinVar ID for the submitted variant, so I can build a direct web link to the ClinVar record.

Example variant: BRCA1:c.5123C>A, p.Ala1708Glu

REST query: https://rest.ensembl.org/vep/human/hgvs/BRCA1:c.5123C>A?content-type=application/json;uniprot=1;mane=1;SpliceAI=1;CADD=1;canonical=1;ccds=1;gencode_basic=1;hgvs=1;numbers=1;clinvar=1;civic=1

ClinVar field from output JSON:

"ClinVar": [
                        "RCV001862642",
                        "RCV001076417",
                        "VCV000867673",
                        "RCV003149709",
                        "RCV002250480",
                        "RCV001353479",
                        "RCV001579293",
                        "RCV000212194",
                        "RCV000167826",
                        "RCV000413608",
                        "RCV000589633",
                        "RCV000457403",
                        "RCV000048803",
                        "RCV000048802",
                        "VCV000055407",
                        "RCV000031221",
                        "VCV000037640",
                        "RCV000077599",
                        "RCV000148393",
                        "RCV000131831",
                        "RCV000131166"
                    ]

Three "VCV" records are returned. VCV000055407 is the correct one for this variant, while the other two correspond to the C>T and C>G variants at this position. Within this field is there any way to predict which is the right ID?

nakib103 commented 7 months ago

Hello @ibosdet,

Thanks for your query!

The ClinVar ids are stored and reported from the co-located known variants. Unfortunately, for your case the co-located variant has multiple other alleles.

There is no direct way to identify the correct identifier for the allele in question. But what you can do is, query all VCV ids you get against ClinVar api and match the alleles to determine the relevant VCV identifier.

Hope that answers the question.

Best regards, Nakib