W3C-HCLSIG / HCLSDatasetDescriptions

7 stars 13 forks source link

idot:AccessPattern #100

Open micheldumontier opened 9 years ago

micheldumontier commented 9 years ago

On the call today we discussed concerns about the suitability of the formulation of the idot:AccessPattern. In particular, we are concerned that appending the idot:identifierPattern to the idot:accessPattern is underspecified and could lead to errors.
Let's take the Gene Ontology (http://identifiers.org/go/) as an example. The idot:identifierPattern is ^GO:\d{7}$

This identifier pattern does not work for the original ontology URI, which is of the form http://purl.obolibrary.org/obo/GO_\d+$

This identifier pattern is not correct for Bio2RDF, as 'GO' should be lowercase 'go' - really the correct access pattern should be a regex of the form http://bio2rdf.org/go:\d{7}+$

I propose that somewhere in the instance of an access pattern is a predicate that specifies the regex pattern.

micheldumontier commented 9 years ago

sent email to identifiers.org group on Jan 26. no response as of yet.

perkeo commented 9 years ago

Hi,

Sorry about the delay in response, and thank you for the reminder!

Firstly, going back to the example of Gene Ontology that you gave: Gene Ontology defines their identifier as being equivalent to the 'GlobalID', which constitutes a 'GO' prefix, and a numerical 'LocalID', separated by a colon [1]. This identifier is used by official Gene Ontology Resources [2], by both BioPortal and OLS, and is by far the most prevalent form found in publications and cross-references.

The OBO Foundry have a policy for the creation of URIs [3], which dictates the transformation of the colon into an underscore. While this policy is not enforced, it is recommended (when using URIs). Hence there are a mixture of ontologies who do or do not implement this.

Anyway, for the Identifiers.org registry, our aim is to store the regular expression reflecting the identifiers assigned by the data provider. If no documentation is available describing identifier strategies, we make an informed decision based on existing identifiers and common practice within the user community.

We originally captured this pattern for our own use, for example to provide users information on potentially malformed URIs. Of course, if we can extend this feature to be more useful to the community at large, then we would encourage them to give us feedback. If there is a clear and demonstrable need from our users to store identifier patterns at the level of individual resources and identification schemes, then we can add it to our roadmap for future development.

However, as far as I understand, all this should not impact the dataset description document, as far as the definition of the idot terms you wish to use is clear and cover the needs.

Cheers,

[1] http://wiki.geneontology.org/index.php/Identifiers [2] http://amigo.geneontology.org/amigo/term/GO:0006915 (official GO resource) [3] http://www.obofoundry.org/id-policy.shtml

AlasdairGray commented 9 years ago

I don't think that we should be focusing on the GO example here. What we are really looking for is a property which allows for the specification of the complete URI pattern where the regex is used to capture the identifier part.

As I understand it, identifiers.org make use of two properties – idot:accessPattern and idot:identifierPattern – to construct the URI. However, we have no formal way of specifying that the two properties need to be spliced together.

I think that what we are looking for is a single property that would allow for the specification of the whole pattern as a regex; something like

:chembl xxx:accessIdentifierPattern "^http://rdf.ebi.ac.uk/resource/chembl/CHEMBL\\d+" .

VoID's void:uriRegexPattern doesn't quite meet our needs since

  1. It entails that the data must be in RDF and we might be linking to a web page
  2. It focuses on the access pattern part rather than the identifier part
micheldumontier commented 9 years ago

+1

micheldumontier commented 9 years ago

@perkeo . We discussed the issue and have proposed an idot:accessIdentifierPattern as an attribute to an instance of the idot:AccessPattern. see the commit c30d6c98ea138d5cdf0fadc29fa412749807edda

micheldumontier commented 9 years ago

@perkeo would you be able to add an entry to the identifiers.org ontology document?

AlasdairGray commented 9 years ago

@micheldumontier Do we need to have the following in the example?

<http://www.ebi.ac.uk/chembl/compound/inspect/>
    idot:primarySource true ;
    dct:format "text/html" ;
    dct:publisher <http://www.ebi.ac.uk> ;
    idot:accessIdentifierPattern "^http://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL\\d+" ;
    a idot:AccessPattern .

<http://identifiers.org/chembl.compound/>
    dct:format "text/html" ;
    idot:accessIdentifierPattern "^http://identifiers.org/chembl.compound/CHEMBL\\d+" ;
    a idot:AccessPattern .

<http://bio2rdf.org/chembl:>
    dct:format "application/rdf+xml" ;
    dct:publisher <http://bio2rdf.org> ;
    idot:accessIdentifierPattern "^http://bio2rdf.org/chembl:CHEMBL\\d+" ;
    a idot:AccessPattern .

<http://linkedchemistry.info/chembl/chemblid>
    dct:format "application/rdf+xml" ;
    idot:accessIdentifierPattern "^http://linkedchemistry.info/chembl/CHEMBL\\d+" ;
    a idot:AccessPattern .
micheldumontier commented 9 years ago

for completeness, yes, we should include.

Michel Dumontier Associate Professor of Medicine (Biomedical Informatics), Stanford University Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group http://dumontierlab.com

On Mon, Mar 9, 2015 at 2:15 PM, Alasdair Gray notifications@github.com wrote:

@micheldumontier https://github.com/micheldumontier Do we need to have the following in the example?

http://www.ebi.ac.uk/chembl/compound/inspect/ idot:primarySource true ; dct:format "text/html" ; dct:publisher http://www.ebi.ac.uk ; idot:accessIdentifierPattern "^http://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL\\d+ http://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL%5C%5Cd+" ; a idot:AccessPattern .

http://identifiers.org/chembl.compound/ dct:format "text/html" ; idot:accessIdentifierPattern "^http://identifiers.org/chembl.compound/CHEMBL\\d+ http://identifiers.org/chembl.compound/CHEMBL%5C%5Cd+" ; a idot:AccessPattern .

http://bio2rdf.org/chembl: dct:format "application/rdf+xml" ; dct:publisher http://bio2rdf.org ; idot:accessIdentifierPattern "^http://bio2rdf.org/chembl:CHEMBL\\d+ http://bio2rdf.org/chembl:CHEMBL%5C%5Cd+" ; a idot:AccessPattern .

http://linkedchemistry.info/chembl/chemblid dct:format "application/rdf+xml" ; idot:accessIdentifierPattern "^http://linkedchemistry.info/chembl/CHEMBL\\d+ http://linkedchemistry.info/chembl/CHEMBL%5C%5Cd+" ; a idot:AccessPattern .

— Reply to this email directly or view it on GitHub https://github.com/indiedotkim/HCLSDatasetDescriptions/issues/100#issuecomment-77944321 .