LeMyst / WikibaseIntegrator

A Python module to manipulate data on a Wikibase instance (like Wikidata) through the MediaWiki Wikibase API and the Wikibase SPARQL endpoint.
MIT License
67 stars 16 forks source link

Support LDF endpoint? #195

Open dpriskorn opened 3 years ago

dpriskorn commented 3 years ago

With this we could encourage people writing queries to offload to LDF when possible and do more computing locally now that WDQS expensive SPARQL endpoint receives >100.000 queries a second.

see https://github.com/pchampin/hydra-py for a Triple Patterns Fragments client example code:

"""
Use TPFStore and performs a SPARQL query on top of it.
Note that this is very (very!) slow as soon as the query becomes slightly complex... :-/
"""
import logging
from rdflib import Graph
import sys
logging.basicConfig(level=logging.INFO)

import hydra.tpf # required to register TPFStore plugin

URL = 'http://data.linkeddatafragments.org/dbpedia2014'
if len(sys.argv) > 1:
    URL = sys.argv[1]

g = Graph("TPFStore")
g.open(URL)

QUERY = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX dbr: <http://dbpedia.org/resource/>
    PREFIX dbo: <http://dbpedia.org/ontology/>

    SELECT * {
        ?p a dbo:Person; dbo:birthPlace ?bp .
    }
    LIMIT 10
"""

print len(g)

results = g.query(QUERY)
for i in results:
    print i
LeMyst commented 3 years ago

I never used LDF or TPF. That seems interesting and, if I understand correctly, that can solve the timeout when querying the SPARQL server.

I'll not put this at the top of the list, for the moment, but that seems promising.

dpriskorn commented 3 years ago

Exactly, no timeouts but often a truck load of requests because we get only 100 results at a time. Complex queries take more time.