MaastrichtU-IDS / d2s-sparql-operations

✨️ Execute SPARQL queries from string, URL or multiple files using the RDF4J framework.
https://maastrichtu-ids.github.io/d2s-sparql-operations/
MIT License
1 stars 1 forks source link

PrefixCommons: implement expand/resolve URIs #21

Open vemonet opened 5 years ago

vemonet commented 5 years ago

Will be implemented as a standalone tool in data2services-sparql-operations as "expand" operation using PrefixCommons registry. It is included in split at the moment.

Like split it will take a (list of?) class and property to resolve the value (e.g. bl:Drug bl:id )

This operation will insert all the statements of the class using the pref URI as subject. And a param will enable to delete previous statement.

Bonus: add a wrapper on top of BridgeDB to integrate BridgeDB identifiers resolution in data2services-sparql-operations the same way as PrefixCommons.

Notes:

vemonet commented 5 years ago

Concept for a SPARQL service like WikiData or identifiers.org from the Life Science Registry spreadsheet

Identifiers resolution SPARQL Service specifications

Implementation details

We will build a SPARQL Service that can be use to resolve identifiers and URIs to get a canonical URI. This Identifiers SPARQL Service will propose a framework and ontology to enable users to resolve URIs efficiently through federated SPARQL querying.

Method

The Identifiers SPARQL Service will enable various identifiers resolver services (BridgeDB, identifiers.org) to be queries through a public SPARQL Service. The different resolvers can be accessed through their own graph in the same SPARQL Service, enabling the user to choose which resolver he wants to use, use a subset, or all of them.

The resolvers can be connected to the SPARQL Service through various methods

Resolvers to implement

The following identifiers resolvers will be implemented to start:

Ontology

We will build an ontology to define standards relations between identifiers and URIs, but new properties can be used to define new relations.

PREFIX idot: <http://identifiers.org/idot/>
?ref idot:preferredPrefix "chembl" ;
  idot:alternatePrefix "chembldb" ;
  idot:identifierPattern "CHEMBL\\d+"^^xsd:string ;
  idot:exampleIdentifier "CHEMBL25"^^xsd:string ;
  idot:accessPattern "http://bio2rdf.org/chembl", 
    "http://identifiers.org/chembl.compound/", 
    "http://www.ebi.ac.uk/chembl/compound/inspect/" .

SPARQL query examples for the Life Science Registry

Resolve common URIs syntax variants for a same entity using the Bio2RDF Life Science Registry spreadsheet

Get reference URI

From any prefix:id or URI, get the canonical reference (URI).

We usedct:alternative from the LifeScienceRegistry graph resolves all URIs variants for a same identifier:

https://purl.uniprot.org/uniprot/P00734 , http://purl.uniprot.org/uniprot/P00734 , https://identifiers.org/uniprot/P00734 , http://identifiers.org/uniprot/P00734 , https://identifiers.org/uniprot:P00734 , "uniprot:P00734"

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?s ?ref ?source WHERE {
  ?s a bl:Drug ;
    dct:identifier ?id .
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    GRAPH ?source {
      ?ref dct:alternative ?id .
    }
  }
}

# Get identifier only from the LifeScienceRegistry service (uses dct:alternative)
SELECT ?s ?ref ?source WHERE {
  ?s a bl:Drug ;
    dct:identifier ?id .
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    GRAPH <https://w3id.org/data2services/identifiers/LifeScienceRegistry> {
      ?ref dct:alternative ?id .
    }
  }
}

Get alternative URIs

From a canonical reference, get all the commonly accepted URIs (we use dct:alternative property)

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?ref ?ids ?source WHERE {
  ?ref a bl:Drug .
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    GRAPH ?source {
      ?ref dct:alternative ?ids .
    }
  }
}

Get alternative IDs

From a canonical reference, get all the available variants IDs of the entity in other databases (with data sources, which relation, metadata), ???).

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?ref ?p ?ids ?source WHERE {
  ?ref a bl:Drug .
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    GRAPH ?source {
      ?ref ?p ?ids .
      # We could add a filter on ?p to take only predicates about alternative IDs
    }
  }
}

Combination: get reference URI, then all possible alternatives

Support subqueries

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?s ?ref ?alternatives ?sourceRef ?sourceAlt WHERE {
  ?s a bl:Drug ;
    dct:identifier ?id .
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    SELECT ?ref WHERE {
      GRAPH ?sourceRef {
        ?ref dct:alternative ?id .
      }
    }
    GRAPH ?sourceAlt {
      ?ref dct:alternative ?alternatives .
    }
  }
}

OPTIONAL: get reference URIs from id without prefix

From an id, get all the possible canonical references (URIs) using SPARQL filter.

PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
# Exact match
SELECT ?s ?ref ?source WHERE {
  ?s a bl:Drug ;
    dct:identifier ?id .
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    GRAPH ?source {
      ?ref dct:identifier ?id .
    }
  }
}
# Regex FILTER to get all URIs starting with https or https://identifiers.org/
SELECT ?s ?ref ?source WHERE {
  SERVICE <https://unimaas.nl/sparql/identifiers> {
    GRAPH ?source {
      ?ref dct:identifier ?id .
      FILTER regex( str(?id), "http[s]?:\/\/identifiers.org\/" )
    }
  }
}

See https://github.com/Wikidata for label service


SPARQL service: https://github.com/JervenBolleman/sparql-identifiers/blob/master/src/main/java/ch/isbsib/sparql/identifiers/SesameSparqlService.java

EBI SPARQL service code: https://github.com/EBISPOT/lodestar/blob/master/lode-core-api/src/main/java/uk/ac/ebi/fgpt/lode/service/SparqlService.java

Example of use

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT DISTINCT ?go WHERE {
    SERVICE <http://identifiers.org/services/sparql>{
      <http://identifiers.org/go/GO:0005892> owl:sameAs ?go .
  }
}LIMIT 10

It look like he is generating the store on the fly out of a IdentifiersOrg store: https://github.com/JervenBolleman/sparql-identifiers/blob/master/src/main/java/ch/isbsib/sparql/identifiers/SesameSparqlService.java#L48

vemonet commented 5 years ago

Will be implemented as a standalone tool in data2services-sparql-operations as "expand" operation using PrefixCommons registry.

It should be separated from the split operation