NCATSTranslator / Feedback

A repo for tracking gaps in Translator data and finding ways to fill them.
7 stars 0 forks source link

Biolink prefix confusion, inconsistencies: how do we represent specific kinds of variants, and which identifier namespaces are preferred. #54

Closed karafecho closed 7 months ago

karafecho commented 1 year ago

The QotM raised a number of issues related to Biolink prefixes, several of which may be worth further investigation.

Variant IDs (and any cross-mapping)

It looks like both are listed as valid prefixes in Biolink, but only CLINVAR is in the context.jsonld, so we probably need to remove ClinVarVariant? Probably a good biolink issue.

Biolink considers rs ids a poor match for variants - they are really more of a locus id - though everybody uses them like this.

Keep "CA" b/c then the url resolves correctly. FWIW, the CAID is really meant to function like a node normalizer for variants, providing all the other names for that entity (in this case including variations across different versioning of the underlying sequences).

For small deletions CAIDs should work; for bigger and for CNV IMO there is not a perfect approach.

Disease IDs:

Looks like it should be ORPHANET, but I am confused as well about why we have both in different places.

Chemicals:

gglusman commented 1 year ago

"Biolink considers rs ids a poor match for variants - they are really more of a locus id" I'm not sure I understand the issue here. A variant refers to variation at a locus, and rsids are excellent for identifying variants. Is there a mixup with allele - the specific version observed for that variant?

sierra-moxon commented 1 year ago

@gglusman - the plan is to do a re-work of allele/variant in biolink in 2023 for sure. rs ids are a good identifier for polymorphisms for sure.

karafecho commented 1 year ago

Also see:

https://github.com/biolink/biolink-model/issues/1119 https://github.com/biolink/biolink-model/issues/1121 (https://github.com/biolink/biolink%20model/issues/1122

sstemann commented 7 months ago

@sierra-moxon can you review this ticket? i'm not sure if it's still open?

sierra-moxon commented 7 months ago

@karafecho - I reviewed all the related Biolink tickets tagged here and I believe I've answered all the questions with what Biolink (and bioregistry by extension) thinks the prefixes for these IDs should be. I do not have a way to validate that all the KPs are using the correct prefixes in bulk, but I do believe the reasoner-validator will warn of incorrect prefix use.

Can we close this ticket?

karafecho commented 7 months ago

Wow! Thanks, Sierra. Yes, the Reasoner Validator should catch incorrect prefix use, so I'll go ahead and close this ticket.