frink-okn / FRINKIssues

0 stars 1 forks source link

paginated queries #18

Open tomlue opened 1 month ago

tomlue commented 1 month ago

Is there a way to get a cursor or some form of efficient pagination on sparql queries done at https://frink.apps.renci.org/federation/sparql.

Making the below request times out, and if I paginate with limit and offset the responses get slower and slower and I suspect will eventually time out.

prefix wdt: <http://www.wikidata.org/prop/direct/>
prefix wikibase: <http://wikiba.se/ontology#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix bd: <http://www.bigdata.com/rdf#>

SELECT DISTINCT ?manufacturer ?manufacturerLabel WHERE {
  ?item wdt:P176 ?manufacturer.
  ?manufacturer rdfs:label ?manufacturerLabel.
  FILTER(LANG(?manufacturerLabel) = "en").
}
zmughal commented 1 month ago

If I try to get a count, it also times out (both federated and restricted to the Wikidata endpoint):

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>

SELECT (COUNT(DISTINCT ?manufacturer) AS ?count)
WHERE {
  ?item wdt:P176 ?manufacturer.
  ?manufacturer rdfs:label ?manufacturerLabel.
  FILTER(LANG(?manufacturerLabel) = "en").
}

whereas running the same query against Wikidata's query service gives a result in 27 seconds image

I also noticed that triples do not have associated named graphs which could be used to restrict the matching triples as an optimization. Related to #16.