biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
172 stars 71 forks source link

Decision on Translator CURIEs #175

Closed edeutsch closed 4 years ago

edeutsch commented 5 years ago

As far as I'm aware we still do not have consensus on what the official CURIE prefixes are. @cmungall says "use BioLink", but there seem to be TWO different documents in BioLink with CURIES defined in them and they are not fully aligned. @michel says "use identifiers.org", also a worthy source. But they are not the same in many cases. Even ignoring the issue of CapiTaliZation, one says "pubmed" and the other says "PMID". which will we choose? The most relevant issues are described here: https://docs.google.com/spreadsheets/d/1BCtJWyz9WpwI-myN3HwWkwsjxvUNr1OrdGU60QCs2Uw/edit

I strongly advocate that we gather the relevant stakeholders here and make a concrete decision and document it in the above Google doc. Perhaps Monday?

TomConlin commented 5 years ago

For the chembl curies, which dipper does not yet have, I would suggest going with what chembl is already using

ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/latest/

(which does not include "chembl.compound" as far as I have seen)

@prefix chembl: http://rdf.ebi.ac.uk/resource/chembl/ . @prefix chembl_activity: http://rdf.ebi.ac.uk/resource/chembl/activity/ . @prefix chembl_assay: http://rdf.ebi.ac.uk/resource/chembl/assay/ . @prefix chembl_binding_site: http://rdf.ebi.ac.uk/resource/chembl/binding_site/ . @prefix chembl_bio_cmpt: http://rdf.ebi.ac.uk/resource/chembl/biocomponent/ . @prefix chembl_cell_line: http://rdf.ebi.ac.uk/resource/chembl/cell_line/ . @prefix chembl_document: http://rdf.ebi.ac.uk/resource/chembl/document/ . @prefix chembl_indication: http://rdf.ebi.ac.uk/resource/chembl/drug_indication/ . @prefix chembl_journal: http://rdf.ebi.ac.uk/resource/chembl/journal/ . @prefix chembl_moa: http://rdf.ebi.ac.uk/resource/chembl/drug_mechanism/ . @prefix chembl_molecule: http://rdf.ebi.ac.uk/resource/chembl/molecule/ . @prefix chembl_protclass: http://rdf.ebi.ac.uk/resource/chembl/protclass/ . @prefix chembl_source: http://rdf.ebi.ac.uk/resource/chembl/source/ . @prefix chembl_target: http://rdf.ebi.ac.uk/resource/chembl/target/ . @prefix chembl_target_cmpt: http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/ .

jmcmurry commented 5 years ago

+1 to chembl repertoire PMID for pubmed, as per their own convention.

On Mon, Nov 19, 2018 at 1:23 PM Tom Conlin notifications@github.com wrote:

For the chembl curies, which dipper does not yet have, I would suggest going with what chembl is already using

ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/latest/

(which does not include "chembl.compound")

@Prefix https://github.com/Prefix chembl: http://rdf.ebi.ac.uk/resource/chembl/ . @Prefix https://github.com/Prefix chembl_activity: http://rdf.ebi.ac.uk/resource/chembl/activity/ . @Prefix https://github.com/Prefix chembl_assay: http://rdf.ebi.ac.uk/resource/chembl/assay/ . @Prefix https://github.com/Prefix chembl_binding_site: http://rdf.ebi.ac.uk/resource/chembl/binding_site/ . @Prefix https://github.com/Prefix chembl_bio_cmpt: http://rdf.ebi.ac.uk/resource/chembl/biocomponent/ . @Prefix https://github.com/Prefix chembl_cell_line: http://rdf.ebi.ac.uk/resource/chembl/cell_line/ . @Prefix https://github.com/Prefix chembl_document: http://rdf.ebi.ac.uk/resource/chembl/document/ . @Prefix https://github.com/Prefix chembl_indication: http://rdf.ebi.ac.uk/resource/chembl/drug_indication/ . @Prefix https://github.com/Prefix chembl_journal: http://rdf.ebi.ac.uk/resource/chembl/journal/ . @Prefix https://github.com/Prefix chembl_moa: http://rdf.ebi.ac.uk/resource/chembl/drug_mechanism/ . @Prefix https://github.com/Prefix chembl_molecule: http://rdf.ebi.ac.uk/resource/chembl/molecule/ . @Prefix https://github.com/Prefix chembl_protclass: http://rdf.ebi.ac.uk/resource/chembl/protclass/ . @Prefix https://github.com/Prefix chembl_source: http://rdf.ebi.ac.uk/resource/chembl/source/ . @Prefix https://github.com/Prefix chembl_target: http://rdf.ebi.ac.uk/resource/chembl/target/ . @Prefix https://github.com/Prefix chembl_target_cmpt: http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/ .

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-model/issues/175#issuecomment-440046171, or mute the thread https://github.com/notifications/unsubscribe-auth/ADfUbzdqoTfoXIeZwopxtNDKVpEThtkfks5uwyFBgaJpZM4Ypyao .

edeutsch commented 5 years ago

Are you suggesting chembl_target instead of CHEMBL.TARGET as the official Translator CURIE?

TomConlin commented 5 years ago

I am not concerned with capitalization, if we have a consistent policy, then follow that. I do suggest that a dot does not belong in a curie prefix.
They must agree with RDF curie rules and are better constrained to XML Qname rules and should agree with identifier hygiene / best practices including, dot is best reserved as the version separator after the local id

Whenever possible it is respectful to propagate the identifier as the original minter would have, but they may not have anticipated every scenario.

edeutsch commented 5 years ago

ah, an interesting new wrinkle! Even biolink-model and identifiers.org agreed on CHEMBL.TARGET/chembl.target with a dot.

jmcmurry commented 5 years ago

All prefix delimiters suck for one reason or another. Underscores are pretty nasty since prevailing use is that they're interchangable with the colon. GO:0000123 to GO_0000123 is fine but CHEMBL_TARGET:0000123 should never go to CHEMBL:TARGET:0000123 As long as you are mindful of the lookaheads necessary to safely convert, it is fine. Ish. Don't forget that no matter what prefix is used, the identifiers.org URL can still be used. There are situations in which the native URI is preferred for the semantic rigor it affords. Eg certain classes of chembl IDs are best referenced in their RDF. Julie

On Tue, Nov 20, 2018 at 3:34 PM Eric Deutsch notifications@github.com wrote:

ah, an interesting new wrinkle! Even biolink-model and identifiers.org agreed on CHEMBL.TARGET/chembl.target with a dot.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/biolink/biolink-model/issues/175#issuecomment-440467335, or mute the thread https://github.com/notifications/unsubscribe-auth/ADfUb8YejVSGy1S76O6sFAzq0wghBdWEks5uxJFygaJpZM4Ypyao .

cmungall commented 5 years ago

@cmungall and @hsolbrig to follow up why Reactome is not in jsonld context file (it is listed in the yaml for reactions)

cmungall commented 5 years ago

@cmungall will coordinate with id.org to include uniprotkb as a prefix

cmungall commented 5 years ago

Action: look at spreadsheet and ensure all ???s are filled in within the jsonld context

micheldumontier commented 5 years ago

Hi, i've added the entries and their links to identifiers.org. there are a couple of resources that i do not know what they refer to (hence the problem with prefixes that do not contain a registrar). there are also some questions about the base uri that is being proposed.

hsolbrig commented 5 years ago

@micheldumontier - which base uri are you referring to?

micheldumontier commented 5 years ago

@hsolbrig the first entry: OMIM. i didn't go further than that.

at the end of the day, here's what we need: 1) a URI template that when filled gives us a machine-readable representation of the entity. this representation should be accessible using content-type negotiation as per HTTP standards. 2) a link to an HTML representation of the resource, for human users.

however, many original data sources do NOT provide machine readable (structured) representations of their records through content negotiation or any other format. we should chronicle which representations are available through each URI pattern.

cmungall commented 5 years ago

It may help to separate requirements here, as we have many stakeholders.

  1. Some groups working primarily with neo4j or conventional database stacks require agreement on CURIEs such that things connect together
  2. Some groups working in the semantic web stack need agreement on the URIs
  3. Additionally others may have the requirement that the URIs resolve to a machine-readable representation of the entity

I think most in translator fall into 1. This includes reasoner teams working with neo4j, alpha and medikanren, and people building workflows where the output IDs of one module need to be passed in as inputs to another module.

Some of us are also working with rdf tooling, where the URI is common currency, so 2 is important for this. The JSON LD contexts provide a way of bridging 1+2.

For 3, I think this is a laudable goal and vision for a larger self-describing linked data web, I don't think there are translator requirements for this at this time. This is not to say we shouldn't try and be compatible with this vision, but just to move forwards we can separate this requirement - maybe make a separate ticket for this - and prioritize what is required to make the Translator a workable system in the short term?

edeutsch commented 5 years ago

Agreed. Well said!

micheldumontier commented 5 years ago

the vision for the Translator is a distributed system that makes use of indexed (self) descriptions in order to identify relevant resources to bring together just in time to execute reasoning tasks. The requirement of URIs to machine readable descriptions is an essential part of that vision, otherwise you are just programming a system no ability to uncover and reuse resources - the exact opposite of the very successful world wide web in which HTML was the agreed language, and hyperlinks the mechanism to link documents together. I strongly recommend that you reconsider the importance of machine readable, standardized descriptions. We are seeking to develop an infrastructure to solve the long standing problem of the discovery and reuse of digital resources, from datasets to databases, to analytical and reasoning services. anything short of this is just another project with limited impact.

nlharris commented 4 years ago

Does this relate to #22 ?