WDscholia / scholia

Wikidata-based scholarly profiles
https://scholia.toolforge.org
Other
219 stars 78 forks source link

Test Scholia queries on other SPARQL endpoints #2063

Open Daniel-Mietchen opened 2 years ago

Daniel-Mietchen commented 2 years ago

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

I'd like us to explore running Scholia on other SPARQL endpoints, Blazegraph or otherwise. We have done some of this in a past, but not in a way that would be scalable across all Scholia queries.

Describe alternatives you've considered

A relatively straightforward approach might be to build a workflow based on running Scholia via the SPARQL endpoint (default: Blazegraph again) of a dedicated Wikibase instance that holds a copy of a recent Wikidata dump. There could even be several such Wikibases, each serving a specific subset (e.g. per Scholia aspect).

Additional context

Other options would be to start exploring non-Blazegraph endpoints, e.g. https://wikidata.demo.openlinksw.com/sparql (running on Virtuoso) or https://qlever.cs.uni-freiburg.de/wikidata/ (running on QLever)

Daniel-Mietchen commented 2 years ago

I just created a simplified version of one of our queries - country_authors.sparql

SELECT
?author 
(COUNT(DISTINCT ?citing_work) AS ?number_of_citing_works)
(SAMPLE(?organization_) AS ?organization)
(SAMPLE(?work) AS ?example_work)
WHERE {
  ?author wdt:P27 | wdt:P1416/wdt:P17 | wdt:P108/wdt:P17 wd:Q35 .
  ?work wdt:P50 ?author .
  OPTIONAL { ?citing_work wdt:P2860 ?work . }
  OPTIONAL {
    ?author wdt:P1416 | wdt:P108 ?organization_ .
    ?organization_ wdt:P17 wd:Q35 .
  }
}
GROUP BY ?author 

It times out on Wikidata, fails on QLever and executes on that Virtuoso instance. Screenshot 2022-07-22 at 00-33-48 Wikidata Query Service

Screenshot 2022-07-22 at 00-32-30 The QLever SPARQL engine fast scalable with autocompletion and text search

Screenshot from 2022-07-22 00-31-54

WolfgangFahl commented 2 years ago

The query runs successfully on some of our endpoints

date;sparqlquery -qn authorsCitingWork -en blazegraph -f github;date
WolfgangFahl commented 1 year ago

see https://github.com/ad-freiburg/qlever/issues/859

egonw commented 1 year ago

Virtuoso-on-AWS: https://wikidata.demo.openlinksw.com/sparql

(Does not support the Wikidata blazegraph functions)

WolfgangFahl commented 1 year ago

https://ceur-ws.org/Vol-3262/paper9.pdf and https://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData have a list of candidates. I also intend to talk to the wikidata team on the next meeting and would love to have a proper blazegraph mirror running at our RWTH Aachen i5 http://wikidata.dbis.rwth-aachen.de/ machine which should be suitable for the task with 256 GB RAM and 10 TB SSD. I never got a proper blazegraph mirror endpoint with all necessary special services running in the past 6 years that i have been attempting to get my own copy of wikidata running.

egonw commented 1 year ago

Oh, you're in Aachen?