Open ncatlett opened 10 years ago
rdflib provides SPARQL query and update against its own graph model. To issue SPARQL calls to a remote triplestore we can use RDFLib/sparqlwrapper.
Most of the orphan equivalences (~18,000) are from EntrezGene; these appear to be primarily mappings to MGI feature types "DNA segment" and "complex/cluster/region" (currently only Gene and Pseudogene are part of namespace)
~700 are from Affymetrix mappings; these are references to withdrawn EGIDs
Other orphan equivalences from:
Also - test for identifiers and prefLabels that are not unique within a concept scheme/namespace
prefix belv: http://www.openbel.org/vocabulary/ prefix skos: http://www.w3.org/2004/02/skos/core# prefix namespace: http://www.openbel.org/bel/namespace/ prefix dc: http://purl.org/dc/terms/
select(count(distinct ?uri2) as ?count)
where {
?uri1 dc:identifier ?id1 .
?uri2 dc:identifier ?id1 .
?uri1 skos:inScheme ?scheme .
?uri2 skos:inScheme ?scheme .
FILTER (?uri1 != ?uri2) .
}
select(count(distinct ?uri2) as ?count) where { ?uri1 skos:prefLabel ?label . ?uri2 skos:prefLabel ?label . ?uri1 skos:inScheme ?scheme . ?uri2 skos:inScheme ?scheme . FILTER (?uri1 != ?uri2) .
}
Equivalence and orthology relationships may point to 'orphan' uris - i.e., uris that are created by the equivalence/orthology relationship and do not exist in the graph. Need to incorporate tests and/or fixes for these.
Return # of 'orphan' uris created by equivalence relationships:
select (count(distinct ?uri2) as ?count) where { ?uri1 skos:exactMatch ?uri2 . minus { ?uri2 skos:inScheme ?scheme .}
}
Return # of 'orphan' uris created by orthology relationships:
select (count(distinct ?uri2) as ?count) where { ?uri1 belv:orthologousMatch ?uri2 . minus { ?uri2 skos:inScheme ?scheme .}
}