Open baskaufs opened 2 years ago
Spun up localhost Fuseki (which supports federated queries unlike Blazegraph) SPARQL interface to run a test.
Note: my first query was
SELECT DISTINCT ?s ?p ?o
WHERE
{
SERVICE <https://5j6diw4i0h.execute-api.us-east-1.amazonaws.com/sparql> {
?s ?p ?o
}
}
limit 5
which turned out to be a bad idea, since it apparently tried to pass all of the millions of triples from Neptune to Fuseki before imposing the limit of 5. It resulted in a 503 (or something) error: service unavailable, which sounds like a pretty bad outcome.
Tried a second query:
PREFIX xsd:<http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#>
PREFIX dcterms: <http://purl.org/dc/terms/>
SELECT DISTINCT ?s ?label ?parent
WHERE
{
SERVICE <https://5j6diw4i0h.execute-api.us-east-1.amazonaws.com/sparql> {
?s skos:prefLabel ?label.
FILTER(lang(?label)='en')
?top skos:prefLabel 'Visual Arts'@en.
?s skos:broader+ ?top.
?s skos:broader ?parent.
}
}
which worked and confirmed that federated queries work fine.
Tried using rdflib to perform a federated query in Python. See https://rdflib.readthedocs.io/en/stable/intro_to_sparql.html#querying-a-remote-service for an example.
# hack of example given in documentation
import rdflib
g = rdflib.Graph()
qres = g.query(
"""
prefix wd: <http://www.wikidata.org/entity/>
SELECT distinct ?p ?o
WHERE {
SERVICE <https://query.wikidata.org/sparql> {
wd:Q42 ?p ?o .
}
}
LIMIT 10
"""
)
for row in qres:
print(row)
This query worked fine.
I tried running some simple federated queries like
import rdflib
g = rdflib.Graph()
qres = g.query(
"""
SELECT DISTINCT ?class
WHERE {
SERVICE <https://5j6diw4i0h.execute-api.us-east-1.amazonaws.com/sparql> {
<http://rs.tdwg.org/dwc/terms/continent> a ?class.
}
}
"""
)
for row in qres:
print(row)
But it failed with a 404 (not found). Fell back to direct query in Fuseki:
SELECT DISTINCT ?class
WHERE {
SERVICE <https://5j6diw4i0h.execute-api.us-east-1.amazonaws.com/sparql> {
<http://rs.tdwg.org/dwc/terms/continent> a ?class.
}
}
and got infinitely spinning circle. However, when I restarted Fuseki, it worked. Tried restarting the kernal in the Jupyter notebook, but that didn't help. Still got a 404, which doesn't make sense.
There are two possible important use cases for doing federated queries against the Neptune triplestore:
Note: in the second case, the federated query needs to be done at a third SPARQL endpoint, since Neptune isn't able to make federated queries to endpoints outside its VPC due to security reasons.