ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
348 stars 42 forks source link

Is there some shorthand for listing multiple language fallback tags? #1227

Closed waldenn closed 7 months ago

waldenn commented 7 months ago

Hi, is there some similar syntax to Wikidata's "wikibase:language" array parameter, to have multiple language-title fallbacks?

SERVICE wikibase:label { bd:serviceParam wikibase:language "de,en,es,nl". }

My query currently looks like the one below, but I would like to check ~100 language-tag fallbacks (so the output will always show some title for niche/local topics):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?anything ?name ?dist ?location WHERE {
  ?anything wdt:P625 ?location .
  ?anything rdfs:label ?name .
  FILTER (LANG(?name) = "de") .
  BIND (geof:distance(?location, "POINT(10.85 48.00)") AS ?dist)
  FILTER (?dist <= 101)
}
ORDER BY ASC(?dist)
LIMIT 10

Thanks for the wonderful Qlever project!

hannahbast commented 7 months ago

@waldenn The standard way to do this in SPARQL is as follows (note that in QLever there is no need for the LIMIT 10, computing the full result is just as efficient):

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT ?anything ?name ?dist ?location WHERE {
  ?anything wdt:P625 ?location .
  BIND (geof:distance(?location, "POINT(10.85 48.00)") AS ?dist)
  FILTER (?dist <= 101)
  OPTIONAL { ?anything rdfs:label ?name FILTER (LANG(?name) = "de") }
  OPTIONAL { ?anything rdfs:label ?name FILTER (LANG(?name) = "en") }
  OPTIONAL { ?anything rdfs:label ?name FILTER (LANG(?name) = "es") }
  OPTIONAL { ?anything rdfs:label ?name FILTER (LANG(?name) = "nl") }
}
ORDER BY ASC(?dist)

QLever query: https://qlever.cs.uni-freiburg.de/wikidata/2uz4dy

The main reason why Blazegraph has a special SERVICE for names is because a query like the above would take forever with Blazegraph. Of course, the solution with OPTIONAL is not practical when you have very many different fallbacks. However, I wonder how often that is really needed in practice.

waldenn commented 7 months ago

Thanks for this! I will create a list of OPTIONALs like that and experiment more with that SPARQL end-point.

I use/need this fallback-title functionality often in my project (https://conze.pt), to always try to show labels for topics. Many (more obscure) entities only have Wikidata labels in a few languages. So when you list those items and only see some Wikidata Qid, it reduces the user experience. Eg. when browsing in English-mode, I would rather see some German-title for the list of topics, than only a Qid.

joka921 commented 7 months ago

One thing that I want to add: a long chain of such optionals would require some time for each query. If you always have exactly the same order of desired languages for your application one could precompute the chain of optionals and pin it to the cache (The query for the "canonical label" of each entity according to your specified order). This of course would require either setting up your own QLever instance, or asking us nicely to at least for experimentation purposes do this for you. Unfortunately we don't have precomputed results that are stored on disk yet, so this would require some RAM.

A query that computes such a result could look like this Unfortunately we currently cannot compute it, as the initial rdfs:label + DISTINCT takes too much RAM, but we will have someone working on this in the next few months.