Closed karafecho closed 7 months ago
"Biolink considers rs ids a poor match for variants - they are really more of a locus id" I'm not sure I understand the issue here. A variant refers to variation at a locus, and rsids are excellent for identifying variants. Is there a mixup with allele - the specific version observed for that variant?
@gglusman - the plan is to do a re-work of allele/variant in biolink in 2023 for sure. rs ids are a good identifier for polymorphisms for sure.
@sierra-moxon can you review this ticket? i'm not sure if it's still open?
@karafecho - I reviewed all the related Biolink tickets tagged here and I believe I've answered all the questions with what Biolink (and bioregistry by extension) thinks the prefixes for these IDs should be. I do not have a way to validate that all the KPs are using the correct prefixes in bulk, but I do believe the reasoner-validator will warn of incorrect prefix use.
Can we close this ticket?
Wow! Thanks, Sierra. Yes, the Reasoner Validator should catch incorrect prefix use, so I'll go ahead and close this ticket.
The QotM raised a number of issues related to Biolink prefixes, several of which may be worth further investigation.
Variant IDs (and any cross-mapping)
It looks like both are listed as valid prefixes in Biolink, but only CLINVAR is in the context.jsonld, so we probably need to remove ClinVarVariant? Probably a good biolink issue.
Biolink considers rs ids a poor match for variants - they are really more of a locus id - though everybody uses them like this.
Keep "CA" b/c then the url resolves correctly. FWIW, the CAID is really meant to function like a node normalizer for variants, providing all the other names for that entity (in this case including variations across different versioning of the underlying sequences).
For small deletions CAIDs should work; for bigger and for CNV IMO there is not a perfect approach.
Disease IDs:
Looks like it should be ORPHANET, but I am confused as well about why we have both in different places.
Chemicals: