biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
176 stars 72 forks source link

Prefix for sequence variants in other species? #1042

Closed cbizon closed 2 years ago

cbizon commented 2 years ago

Is your feature request related to a problem? Please describe. We want to ingest sequence variants for various plants like Arabadopsis, but we don't know how to represent them. None of the id_prefixes for sequence variant are appropriate.

Describe the solution you'd like I'm not certain... Is there a standard way to write HGVS-like identifiers?

What working group (or team) did this request originate from? ROBOKOP

Tag relevant members for discussion @shalsh23

sierra-moxon commented 2 years ago

I know of a couple of variant standards, one of which is HGVS. Another is SPDI: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7523648/, but maybe you are asking for a service that resolves any HGVS nomenclature onto a genome browser? I don't know of a service like that.

For the ID prefixes question: do you have HGVS names (ids) from TAIR? I found this service: https://uswest.ensembl.org/info/docs/tools/vep/recoder/index.html that converts between HGVS and SPDI, etc. And the resources that generate identifiers for variants: https://uswest.ensembl.org/info/genome/variation/species/sources_documentation.html

Chris M pointed out this resource: https://mutalyzer.nl/ for verifying syntax of variant nomenclature.

The Alliance of Genome Resources and maarvel.org both allow searching by HGVS, but unfortunately don't support plant species. https://www.alliancegenome.org/search?q=NC_000070.7%3Ag.101672390A%3EC

If TAIR or another plant database does curate variants, and has their own prefix for their resource, we can certainly add that prefix to the sequence variant class.

cbizon commented 2 years ago

As far as I can tell (?) most of the plant dbs use identifiers that are probably pretty similar or transformable to SPDI. So I think we could make that work. Any thoughts on how we turn a SPDI into an identifier? Something like "SPDI:NG_012345.1:4:G:T"?

sierra-moxon commented 2 years ago

I'm not sure. Since I don't know of a service that will resolve an arbitrary identifier in SPDI format, or in HGVS format, probably we can't make a curie for it? If NCBI resolved them, then I could see something like ncbi.spdi:NG_012345.1:4:G:T, or if alliancegenome.org resolved them, then agrkb:NG_012345.1:4:G:T ?

cmungall commented 2 years ago

shall we float an issue on the bioregistry tracker? it may be deemed out of scope but would be good to get feedback from others there

On Mon, Jul 18, 2022 at 5:16 PM Sierra Moxon @.***> wrote:

I'm not sure. Since I don't know of a service that will resolve an arbitrary identifier in SPDI format, or in HGVS format, probably we can't make a curie for it? If NCBI resolved them, then I could see something like ncbi.spdi:NG_012345.1:4:G:T, or if alliancegenome.org resolved them, then agrkb:NG_012345.1:4:G:T ?

— Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-model/issues/1042#issuecomment-1188459416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOMFGMAGT4P3PVIAE5DVUXXUTANCNFSM5ZDJ5HPA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

cbizon commented 2 years ago

It's preferable that the identifier be resolvable, but is it strictly necessary?

cmungall commented 2 years ago

No strong opinion but I do think it’s important that the prefix be registered somehow

On Tue, Jul 19, 2022 at 6:22 AM cbizon @.***> wrote:

It's preferable that the identifier be resolvable, but is it strictly necessary?

— Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-model/issues/1042#issuecomment-1189048167, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJRGI6CEYTUDFLBJ2TVU2T2PANCNFSM5ZDJ5HPA . You are receiving this because you commented.Message ID: @.***>

sierra-moxon commented 2 years ago

ok - issue is made at bioregistry (see link above) - some discussion will likely happen there. I could add spdi as a prefix with its URL pointing to the SPDI API here: https://api.ncbi.nlm.nih.gov/variation/v0/ to biolink-model directly for now. Wil that fix your use case @cbizon? The IDs won't really resolve except to give a more structured JSON response to a query by SPDI id?

cbizon commented 2 years ago

Yes, I think that will meet our current use case, thanks!