biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
169 stars 71 forks source link

add 'url' property with a description that was difficult to write in … #1492

Closed sierra-moxon closed 1 month ago

sierra-moxon commented 2 months ago

…the context of the existing xref property

for your review @codewarrior2000

fixes #https://github.com/biolink/biolink-model/issues/1486

codewarrior2000 commented 2 months ago

Thank you @sierra-moxon.It's nice, simple and it works for me. The cautionary note contrasting the "url" property with the "xref" property is very helpful.

sierra-moxon commented 2 months ago

@codewarrior2000 - thanks Larry! The thing I am slightly worried about is that for your reactome example, to fully represent both reactome links on a single node, you'd have one xref property filled out (a CURIE) on the node and one url property filled out (the full expanded URL) on the node. Should the guidance be that when url is present on a node, any xref annotation should also be expanded fully into a URL and stored in a url property? I'm thinking of downstream consumers, like the UI, that will need special code to display one or the other.

I'm also a little worried that the definition of xref currently allows URIs. So, technically, someone could use xref for any URL/**URI*** that is related to the node or edge it is used on. Should we restrict the definition of xref to just be a CURIE? If we do that, we may be out of sync with other databases and/or have refactoring to do.

Per our discussion, this property is also single-valued (assuming that in TRAPI, many such attributes will be submitted in the TRAPI message when necessary). How will that be stored in the data store? Perhaps this is unnecessary for me to know, but I am trying to imagine how a non-Translator user would use this slot if they need to represent more than one of these kinds of URLs for a particular node.

codewarrior2000 commented 2 months ago

@sierra-moxon, thank you. I have been wondering why the meeting discussions had been concerned about two reactome links. The original intention was just to pass along the one URL that we found in the Reactome database, which we had called the "reaction_url", which links to the Reactome Pathway Browser. (e.g., https://reactome.org/PathwayBrowser/#/R-MMU-5655466)

Is there a Biolink Model requirement that I am not aware of that requires both links?

codewarrior2000 commented 2 months ago

@sierra-moxon Sorry, Sierra, I had to talked it over with Vlado about the url property being single-valued though, yet there is a need to handle multiple URLs. Any user with multiple URLs for a node should know to present each URL as an individual node attribute. The technical requirement will be imposed on ARAs to recognize that there can be multiple node attributes of the same type (URL).

codewarrior2000 commented 2 months ago

I'm also a little worried that the definition of xref currently allows URIs

@sierra-moxon Question. As the most conservative approach, what if we continue to let the xref property be used for both URL/URI and for CURIE? Has that broken anything in Translator yet?

sierra-moxon commented 2 months ago

thanks @codewarrior2000 and @vdancik!

TL;DR:

  1. I think the trouble I am having with biolink:url being single-valued is that it does not make sense outside of Translator technical architecture.
  2. I am trying to make this property generic so that if the concept of an alternative url is different in Biolink from an xref, we have it available for use in other contexts besides reactome.
  3. without changing the definition of xref we likely could have nodes with one or the other xref or url with the intent to represent the same thing -- it will be inconsistent for the UI and difficult to choose between for folks using Biolink outside of Translator.

  1. I think the trouble I am having with biolink:url being single-valued is that it does not make sense outside of Translator technical architecture.

Sort of self-explanatory, but Biolink is used for KGs other than those currently in Translator and I think without TRAPI its difficult for a user of Biolink to use our proposed biolink:url without it being multivalued.

  1. I am trying to make this property generic so that if the concept of an alternative url is different in Biolink from an xref and we have it available for use in other contexts besides reactome.

For example, the same CURIE that represents a mouse gene can be used in many URLs to see different views of that gene at MGI:

https://www.informatics.jax.org/marker/MGI:97486 <-- the full gene page at MGI, the default URI expansion of the curie: MGI:97486 https://www.informatics.jax.org/gxd/marker/MGI:97486 <-- the gene expression information at MGI http://www.informatics.jax.org/gxd/marker/MGI:97486?tab=imagestab <-- just the images of the gene expression at MGI

This is a very similar use case as the reactome use case. If we chose the biolink:id for a mouse gene node to be the NCBIGene identifier (NCBIGene:18504, then we could include an xref property on that node, MGI:97486. Its default URI expansion would be: http://identifiers.org/mgi/MGI:97486 and this URL could be used to redirect the user to https://www.informatics.jax.org/marker/MGI:97486

But those other two MGI links are also valid, and take the user to a different view of the data. So similarly, we'd argue in this PR, that those two other MGI links are not biolink:xrefs, they are biolink:urls. But, technically, someone could provide those three MGI URLs in the biolink:xref field because we're allowing URIs as well (some handwaving here between URI and URL, I do know that the URI is technically the default expansion of this CURIE, but I'm not confident that users will take the time to disambiguate).

  1. And I'd like to disambiguate xref from url so that new users not privy to this PR can decide which property to use to provide links to other sources with different views of the "same" data. It could be that we need a better description here: it was hard to clarify the distinction.

biolink:xref has these properties:

biolink:url or biolink:alternative_url has these properties:

Perhaps we should require that if an xref is provided on a node as a CURIE, it is expanded and added into a url property as well. Similarly, if the xref provided is a URI, then it should be duplicated into the url property. This helps us with consistency for downstream consumption.

At a minimum I think we need text descriptions that help disambiguate and I would welcome help here, of course. :)

codewarrior2000 commented 2 months ago

@sierra-moxon We appreciate the TL;DR. I will need some time to digest the implications of how xref and url properties coexist and interplay.