Open Daniel-Mietchen opened 2 years ago
Over in
we had a brief discussion about QLever, including an example query https://qlever.cs.uni-freiburg.de/wikidata/J8PSek for which I am pasting a screenshot below:
The corresponding query on Wikidata Query Services takes 2.7. secs to run as of 2022-01-28
With a pdf filter it is slightly slower.
The more elaborate query which should show the authorslist: Times out on wikidata
#
# Example Query for
# https://github.com/WDscholia/scholia/issues/1774
#
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Scholarly articles with full text
SELECT ?paper ?paperLabel ?publishedIn ?publishedInLabel ?event ?eventLabel ?fullText ?authors
WHERE
{
?paper wdt:P31 wd:Q13442814.
?paper rdfs:label ?paperLabel.
filter(lang(?paperLabel) = "en").
?paper wdt:P953 ?fullText.
?paper wdt:P953 ?fullText filter (strends(str(?fullText), ".pdf" )).
#filter(regex(?fullText, "\\.pdf\\>$" )).
?paper wdt:P1433 ?publishedIn.
?publishedIn rdfs:label ?publishedInLabel.
filter(lang(?publishedInLabel) = "en" ).
?publishedIn wdt:P4745 ?event.
?event rdfs:label ?eventLabel.
filter(lang(?eventLabel) = "en").
OPTIONAL
{
SELECT (GROUP_CONCAT(?authorLabel) as ?authors) WHERE {
?paper wdt:P50 ?author.
?author rdfs:label ?authorLabel filter(lang(?authorLabel) = 'en').
} GROUP BY ?paper
}
}
The authors query:
#
# test Query for https://github.com/WDscholia/scholia/issues/1774
# WF 2022-01-28
#
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Authors
SELECT ?work ?workLabel ?author ?authorLabel
WHERE
{
?work wdt:P50 ?author.
?work rdfs:label ?workLabel .
?author rdfs:label ?authorLabel.
}
LIMIT 10
runs on qlever in some 30 secs with >300 million results (limited to 10) while the wikidata query service takes only 0.2 s
Looks like we need a proper set of queries to do a fair comparison and check for compatibility.
If we find that QLever is a good alternative for Scholia, I would like to help set it up in the WMC Toolforge Kubernetes cluster :)
Just for the record, there is a Phabricator ticket to Evaluate QLever as a time lagging SPARQL backend to offload the BlazeGraph cluster with overlapping threads and participants.
Looks like we need a proper set of queries to do a fair comparison and check for compatibility.
Perhaps some Scholia queries could be part of that benchmarking set.
If we find that QLever is a good alternative for Scholia, I would like to help set it up in the WMC Toolforge Kubernetes cluster :)
Perhaps we won't find out whether it can be that alternative if we do not have such test instances to play around with and to test the workflows (e.g. including exports/ dumps and database refreshes).
Testing is easier now, and we should be able to just change this line to test with Qlever:
https://github.com/egonw/scholia/blob/8ff64dee13940ad28fd5d7dd97ad4bdc2d2628b4/scholia/query.py#L64
# # Example Query for # https://github.com/WDscholia/scholia/issues/1774 # PREFIX wdt: <http://www.wikidata.org/prop/direct/> PREFIX wd: <http://www.wikidata.org/entity/> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> # Scholarly articles with full text SELECT ?paper ?paperLabel ?publishedIn ?publishedInLabel ?event ?eventLabel ?fullText ?authors WHERE { ?paper wdt:P31 wd:Q13442814. ?paper rdfs:label ?paperLabel. filter(lang(?paperLabel) = "en"). ?paper wdt:P953 ?fullText. ?paper wdt:P953 ?fullText filter (strends(str(?fullText), ".pdf" )). #filter(regex(?fullText, "\\.pdf\\>$" )). ?paper wdt:P1433 ?publishedIn. ?publishedIn rdfs:label ?publishedInLabel. filter(lang(?publishedInLabel) = "en" ). ?publishedIn wdt:P4745 ?event. ?event rdfs:label ?eventLabel. filter(lang(?eventLabel) = "en"). OPTIONAL { SELECT (GROUP_CONCAT(?authorLabel) as ?authors) WHERE { ?paper wdt:P50 ?author. ?author rdfs:label ?authorLabel filter(lang(?authorLabel) = 'en'). } GROUP BY ?paper } }
I found a small bug in that query, see where it is fixed (there was a missing str() in the regex filter): https://qlever.cs.uni-freiburg.de/wikidata/KBDY3n My guess: The author part times out because of a missing index in QLever, we are doing something the designers have not tested.
@dpriskorn Note how #2412 is intended as providing a viable migration path. And yes - we'd like to run a bunch of wikidata endpoints with different technologies in the Wikimedia Foundations data center. Who would be our contact for this?
Do we have a place to setup a Synia webapp? That could go to that endpoint and we could more easily test queries.
Is your feature request related to a problem? Please describe. Scholia currently queries the Wikidata Query Service, which currently relies on Blazegraph, which is suspected to fail within the next few years, as per
1721
Describe the solution you'd like One of the options to address this is to use another query engine, e.g. QLever.
Describe alternatives you've considered
993
1773