Open vemonet opened 5 years ago
Concept for a SPARQL service like WikiData or identifiers.org from the Life Science Registry spreadsheet
We will build a SPARQL Service that can be use to resolve identifiers and URIs to get a canonical URI. This Identifiers SPARQL Service will propose a framework and ontology to enable users to resolve URIs efficiently through federated SPARQL querying.
The Identifiers SPARQL Service will enable various identifiers resolver services (BridgeDB, identifiers.org) to be queries through a public SPARQL Service. The different resolvers can be accessed through their own graph in the same SPARQL Service, enabling the user to choose which resolver he wants to use, use a subset, or all of them.
The resolvers can be connected to the SPARQL Service through various methods
?ref dct:alternative ?id
then if the ?ref
is provided we execute a method using regex to generate all possible URIs. If the ?id
is provided then we executeThe following identifiers resolvers will be implemented to start:
We will build an ontology to define standards relations between identifiers and URIs, but new properties can be used to define new relations.
Reference
dct:alternative
owl:sameAs
idot
?PREFIX idot: <http://identifiers.org/idot/>
?ref idot:preferredPrefix "chembl" ;
idot:alternatePrefix "chembldb" ;
idot:identifierPattern "CHEMBL\\d+"^^xsd:string ;
idot:exampleIdentifier "CHEMBL25"^^xsd:string ;
idot:accessPattern "http://bio2rdf.org/chembl",
"http://identifiers.org/chembl.compound/",
"http://www.ebi.ac.uk/chembl/compound/inspect/" .
skos:prefLabel
and skos:altLabel
?Resolve common URIs syntax variants for a same entity using the Bio2RDF Life Science Registry spreadsheet
From any prefix:id
or URI
, get the canonical reference
(URI).
We usedct:alternative
from the LifeScienceRegistry graph resolves all URIs variants for a same identifier:
https://purl.uniprot.org/uniprot/P00734 , http://purl.uniprot.org/uniprot/P00734 , https://identifiers.org/uniprot/P00734 , http://identifiers.org/uniprot/P00734 , https://identifiers.org/uniprot:P00734 , "uniprot:P00734"
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?s ?ref ?source WHERE {
?s a bl:Drug ;
dct:identifier ?id .
SERVICE <https://unimaas.nl/sparql/identifiers> {
GRAPH ?source {
?ref dct:alternative ?id .
}
}
}
# Get identifier only from the LifeScienceRegistry service (uses dct:alternative)
SELECT ?s ?ref ?source WHERE {
?s a bl:Drug ;
dct:identifier ?id .
SERVICE <https://unimaas.nl/sparql/identifiers> {
GRAPH <https://w3id.org/data2services/identifiers/LifeScienceRegistry> {
?ref dct:alternative ?id .
}
}
}
From a canonical reference
, get all the commonly accepted URIs (we use dct:alternative
property)
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?ref ?ids ?source WHERE {
?ref a bl:Drug .
SERVICE <https://unimaas.nl/sparql/identifiers> {
GRAPH ?source {
?ref dct:alternative ?ids .
}
}
}
From a canonical reference
, get all the available variants IDs
of the entity in other databases (with data sources, which relation, metadata), ???).
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?ref ?p ?ids ?source WHERE {
?ref a bl:Drug .
SERVICE <https://unimaas.nl/sparql/identifiers> {
GRAPH ?source {
?ref ?p ?ids .
# We could add a filter on ?p to take only predicates about alternative IDs
}
}
}
Support subqueries
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
SELECT ?s ?ref ?alternatives ?sourceRef ?sourceAlt WHERE {
?s a bl:Drug ;
dct:identifier ?id .
SERVICE <https://unimaas.nl/sparql/identifiers> {
SELECT ?ref WHERE {
GRAPH ?sourceRef {
?ref dct:alternative ?id .
}
}
GRAPH ?sourceAlt {
?ref dct:alternative ?alternatives .
}
}
}
From an id
, get all the possible canonical references
(URIs) using SPARQL filter.
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX bl: <https://w3id.org/biolink/vocab/>
# Exact match
SELECT ?s ?ref ?source WHERE {
?s a bl:Drug ;
dct:identifier ?id .
SERVICE <https://unimaas.nl/sparql/identifiers> {
GRAPH ?source {
?ref dct:identifier ?id .
}
}
}
# Regex FILTER to get all URIs starting with https or https://identifiers.org/
SELECT ?s ?ref ?source WHERE {
SERVICE <https://unimaas.nl/sparql/identifiers> {
GRAPH ?source {
?ref dct:identifier ?id .
FILTER regex( str(?id), "http[s]?:\/\/identifiers.org\/" )
}
}
}
See https://github.com/Wikidata for label service
SPARQL service: https://github.com/JervenBolleman/sparql-identifiers/blob/master/src/main/java/ch/isbsib/sparql/identifiers/SesameSparqlService.java
EBI SPARQL service code: https://github.com/EBISPOT/lodestar/blob/master/lode-core-api/src/main/java/uk/ac/ebi/fgpt/lode/service/SparqlService.java
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?go WHERE {
SERVICE <http://identifiers.org/services/sparql>{
<http://identifiers.org/go/GO:0005892> owl:sameAs ?go .
}
}LIMIT 10
It look like he is generating the store on the fly out of a IdentifiersOrg store: https://github.com/JervenBolleman/sparql-identifiers/blob/master/src/main/java/ch/isbsib/sparql/identifiers/SesameSparqlService.java#L48
Will be implemented as a standalone tool in data2services-sparql-operations
as "expand" operation using PrefixCommons registry.
It should be separated from the split operation
Will be implemented as a standalone tool in
data2services-sparql-operations
as "expand" operation using PrefixCommons registry. It is included in split at the moment.Like split it will take a (list of?) class and property to resolve the value (e.g. bl:Drug bl:id )
First solve "ensembl:ENSG00000181019" to its pref URI: https://identifiers.org/ensembl:ENSG00000181019. The preferred databases are:
Then solve similar URIs to the pref URI. e.g.: https://www.ensembl.org/id/ENSG00000181019 to https://identifiers.org/ensembl:ENSG00000181019
This operation will insert all the statements of the class using the pref URI as subject. And a param will enable to delete previous statement.
Bonus: add a wrapper on top of BridgeDB to integrate BridgeDB identifiers resolution in
data2services-sparql-operations
the same way as PrefixCommons.Notes: