ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
348 stars 42 forks source link

QLever missing features or unexpected behavior #615

Open hannahbast opened 2 years ago

hannahbast commented 2 years ago

Whenever you encounter a query that you think should work (according to the SPARQL 1.1 standard) but throws an error message or gives an unexpected result, please post it here. Please check whether a similar query has already been posted before.

I will now move various queries from other issues here, so that we don't have one issue per query (which doesn't really help).

hannahbast commented 2 years ago

The SERVICE keyword is not yet supported [reported by @WolfgangFahl on 26.02.2022]:

https://qlever.cs.uni-freiburg.de/wikidata/QhTtmx

hannahbast commented 2 years ago

Square brackets are not yet supported [reported by @dpriskorn on 25.02.2022]:

https://qlever.cs.uni-freiburg.de/wikidata/dtZ84t

hannahbast commented 2 years ago

Minor syntax features not yet supported [reported by @WolfgangFahl on 19.02.2022]:

https://qlever.cs.uni-freiburg.de/wikidata/BGoE0M

Variant of the query that works: https://qlever.cs.uni-freiburg.de/wikidata/VgHqGB

hannahbast commented 2 years ago

Currently, only GET queries are supported, not POST [reported by @WolfgangFahl on 19.02.2022]:

This is easy to fix and on our list.

hannahbast commented 2 years ago

String functions are not yet supported in QLever [reported by @dpriskorn on 28.01.2022]:

https://qlever.cs.uni-freiburg.de/wikidata/EnB513

NOTE: The code for supporting functions in SPARQL expressions is all there. It's just a matter of adding more functions.

hannahbast commented 2 years ago

String functions not yet implemented [reported by @WolfgangFahl on 28.01.2022]:

https://qlever.cs.uni-freiburg.de/wikidata/0fidpY

The following equivalent query works: https://qlever.cs.uni-freiburg.de/wikidata/m49coC

hannahbast commented 2 years ago

Basic ?s ?p ?o query does not work yet in QLever [reported by @balhoff on 10.12.2021]:

https://qlever.cs.uni-freiburg.de/wikidata/VSpbRD

NOTE: This is easy to implement and is on our list

jeremiahpslewis commented 2 years ago

Quick question here, it's hard as a user to see which features are not currently supported if they are all in a single issue. What about using a query label to group all of these issues, but having separate issues for each failed query 'feature'?

hannahbast commented 2 years ago

String functions URI and CONCAT not yet supported [reported by @dpriskorn on 15.10.2022]:

https://qlever.cs.uni-freiburg.de/wikidata/5sDHUw

This variant of the query works: https://qlever.cs.uni-freiburg.de/wikidata/aIbGvo

WolfgangFahl commented 2 years ago

@hannahbast

607 shows how the sample queries can be tested with different endpoints.

hannahbast commented 2 years ago

FILTER with arbitrary expressions not yet supported [reported by @graue70 on 03.09.2021]:

https://qlever.cs.uni-freiburg.de/wikidata/Zd77YE

NOTE: The code for supporting arbitrary expressions is in place in the meantime, we "only" need to add this feature

hannahbast commented 2 years ago

Quick question here, it's hard as a user to see which features are not currently supported if they are all in a single issue. What about using a query label to group all of these issues, but having separate issues for each failed query 'feature'?

@jeremiahpslewis You are completely right and this issue is not for having a good overview, it's just to give people a chance to add a new query if they encounter one without opening a new issue every time. Having dozens of issues for things which are already on our list is not really helpful.

The solution I propose is that I add a page on the Wiki with a list of the features currently not supported (together with an estimate of when they will be supported and how hard it is).

dpriskorn commented 2 years ago

Do you accept pull requests? What would be a good first issue? 😀

WolfgangFahl commented 2 years ago

I have added more details to #607 and suggest to use named queries for continuous integration. IMHO it would be good to add the name of the query as comment e.g.

sparqlquery -qp wikidata.yaml --showQuery -qn HumansWithLibrisEntryAndImageAndMap 

named query HumansWithLibrisEntryAndImageAndMap

# humans with images and maps in swedish national library
# 53142 results in 20.6 seconds on Wikidata Query Service
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT ?item ?librisuri ?coord (sample(?img) AS ?image) (sample(?map) AS ?map) WHERE {
  # humans with a Libris-URI (swedish national library) and a known birthplace
  ?item wdt:P5587 ?librisid;
        wdt:P31 wd:Q5;
        wdt:P19 ?birthplace.
  # birthplace coordinates
  ?birthplace wdt:P625 ?coord.
  # image of the subject
  OPTIONAL {?item wdt:P18 ?img}.
  # map of the subject
  OPTIONAL {?item wdt:P242 ?map}.
  BIND(URI(CONCAT("https://libris.kb.se/katalogisering/",?librisid)) AS ?librisuri)
}
group by ?item ?librisuri ?coord
WolfgangFahl commented 2 years ago

@hannahbast Having dozens of issues for things which are already on our list is not really helpful. There is a tradeoff here. Personally i'd prefer an issue per case because referencing is much easier. Adding labels to the issues would probably help and then there is of course the "duplicate of ..." which makes sure there will be a core issue for each case.

joka921 commented 2 years ago

Do you accept pull requests? What would be a good first issue? grinning Of course, see my email from yesterday.

hannahbast commented 2 years ago

@WolfgangFahl I don't think it make sense to have an issue for every minor feature that is missing at this point. Especially since the missing features come in large groups and then a single PR might address many issues at the same time. I don't think it's the best investment of our time to clean up the issues :-)

When we are approaching full SPARQL 1.1 support, this certainly does make sense. Then we can also react quickly: Issue -> PR -> solved. Right now, we can just repeat ourselves and says: yes we know, will be implemented soon, please give us some more time

WolfgangFahl commented 2 years ago

https://qlever.cs.uni-freiburg.de/wikidata/XGc3nb

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Event / Proceedings pairs
SELECT ?event ?eventLabel ?proceedings ?proceedingsLabel ?ppn
WHERE {
  ?event wdt:P31 wd:Q2020153;
  # is proceedings from
  ^wdt:P4745 ?proceeding .
  #OPTIONAL {
  # ?proceedings wdt:P6721 ?ppn
  #}
  ?proceedings rdfs:label ?proceedingsLabel.
  filter(lang(?proceedingsLabel) = "en").
}
Exception: BAD QUERY (Could not find a suitable execution tree. Likely cause: Queries that require joins of the full index with itself are not supported at the moment.; in ../src/engine/QueryPlanner.cpp, line 2325, function std::vector > QueryPlanner::fillDpTab(const QueryPlanner::TripleGraph&, const std::vector&, const std::vector >&))

The error messages is IMHO not helpful .. it's just a typing error

https://qlever.cs.uni-freiburg.de/wikidata/3ZaksA

hannahbast commented 2 years ago

@WolfgangFahl I agree that the error message poorly describes what it should describe. What is should describe is that the query has two parts which are not connected. That, however, is indeed the problem with your example query. Here is a simpler query which has that problem due to a typo. I don't think, it's the task of a SPARQL engine to guess that the two independent query parts are due to a typo. It's an unfortunate property of SPARQL that if you mistype a variable name, you frequently get into this situation. Autocompletion helps to avoid that.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT ?subject WHERE {
  ?subject wdt:P31 wd:Q5 .
  ?sujbect wdt:P31 wd:Q159979 
}
WolfgangFahl commented 2 years ago
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?count (COUNT(?count) AS ?frequency) WHERE {
# Count all human(Q5) https://www.wikidata.org/wiki/Q5 items
# with the given date of birth(P569) https://www.wikidata.org/wiki/Property:P569 
SELECT ?item ?itemLabel (COUNT (?value) AS ?count)
WHERE
{
  # instance of human
  ?item wdt:P31 wd:Q5.
  ?item wdt:P106 wd:Q82594.
  ?item rdfs:label ?itemLabel.
  filter (lang(?itemLabel) = "en").
  # date of birth
  ?item wdt:P569 ?value.
} GROUP by ?item ?itemLabel
}
GROUP BY ?count
ORDER BY DESC (?frequency)

https://qlever.cs.uni-freiburg.de/wikidata/IlDxae

Exception: ParseException, cause: Expected a token of type IRI but got a token of type KEYWORD (select) in the input at pos 153 : SELECT ?item ?itemLabel (COUNT (?value) AS ?count) WHERE { # instance of human ?item wdt:P31 wd:Q5. ?item wdt:P106 wd:Q82594. ?item rdfs:label ?itemLabel. filter (lang(?itemLabel) = "en"). # date of birth ?item wdt:P569 ?value. } GROUP by ?it
hannahbast commented 2 years ago

@WolfgangFahl QLever currently requires { ... } around subqueries. That is, the following equivalent query works: https://qlever.cs.uni-freiburg.de/wikidata/9DIJOj

Background information: QLever already uses a proper parser generator (ANTLR) for a part of its parsing (in particular, the SPARQL expressions). It's not yet used for parsing the whole SPARQL query. That explains many of these small inconsistencies. It will be solved soon.

WolfgangFahl commented 2 years ago

@hannahbast thx for the hint. I am currently working a tool that analyses "how tabular" a query based on an item and a list of properties is. This analysis might need some SPARQL features such as GROUP_CONCAT or workarounds for these.

The double curly braces issues is solvable with a workaround. The next issue is:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# get the label for the given item
SELECT ?itemLabel
WHERE
{
  VALUES (?item) {
    (wd:Q2020153)
  }
  ?item rdfs:label ?itemLabel.
  #filter (lang(?itemLabel) = "en").
}

which does not work with or without filter. see https://qlever.cs.uni-freiburg.de/wikidata/RK1eyG

while try it! works

hannahbast commented 2 years ago

@WolfgangFahl Just omit the two (...) in the VALUES clause and it works: https://qlever.cs.uni-freiburg.de/wikidata/7UCU23

tuukka commented 2 years ago

Is there a way to emulate EXISTS and BOUND which both seem to be unimplemented?

EDIT: I found !ASK but cannot use it in a FILTER: Exception: ParseException, cause: ! is not a valid left hand side for a filter. EDIT2: boolean literals don't seem to be implemented as false = ASK ... results in Unexpected input: false = ASK ... EDIT3: COALESCE also unimplemented.

hannahbast commented 2 years ago

@tuukka Do you have an example query? We are currently still working on full expression support. The basic machinery is in place (that was the hard part of the work), but not all functions and operators are implemented yet.

@joka921 Can you briefly comment on EXISTS and BOUND. Are they particularly easy or do you see any particular difficulty?

tuukka commented 2 years ago

@tuukka Do you have an example query? We are currently still working on full expression support. The basic machinery is in place (that was the hard part of the work), but not all functions and operators are implemented yet.

Here's an example query with EXISTS: Wikidata items that are humans but lack an article in English Wikipedia: https://qlever.cs.uni-freiburg.de/wikidata/58H7Sx

Same query using BOUND: https://qlever.cs.uni-freiburg.de/wikidata/9d0XXz

hannahbast commented 2 years ago

Ok, those are easy:

  1. For the first query, you can replace FILTER NOT EXISTS by MINUS: https://qlever.cs.uni-freiburg.de/wikidata/ZuEUiK
  2. For the second query, you can replace FILTER(!BOUND(?article)) by FILTER(?article > "<Z"): https://qlever.cs.uni-freiburg.de/wikidata/fPUSFt

Explanation @1: The two are often equivalent, except for some obscure queries involving OPTIONAL (not the case here). Explanation @2: The "null" value is large than all "real" values in QLever.

Note that both queries also work without LIMIT, whereas on https://query.wikidata.org they time out (like almost all queries with non-trivial constraints and without LIMIT).

tuukka commented 2 years ago

@hannahbast Thank you, I had completely missed MINUS! Now the query I originally wanted to write is working (although somewhat slow and threatens to run out of memory): Wikidata items that have many articles but not an English one: https://qlever.cs.uni-freiburg.de/wikidata/R5z27F

EDIT: If a guide for porting queries from WDQS does not exist yet, that might be a nice way to document the current differences?

hannahbast commented 2 years ago

EDIT: If a guide for porting queries from WDQS does not exist yet, that might be a nice way to document the current differences?

@tuukka @WolfgangFahl @joka921 I have started a Wiki page now: https://github.com/ad-freiburg/qlever/wiki/Current-deviations-from-the-SPARQL-1.1-standard

WolfgangFahl commented 2 years ago

in-clause for Filter is missing see https://qlever.cs.uni-freiburg.de/wikidata/ruLOCV

hannahbast commented 2 years ago

@WolfgangFahl Thanks, I have added it to https://github.com/ad-freiburg/qlever/wiki/Current-deviations-from-the-SPARQL-1.1-standard , at the bottom. There you also find a variant of the query that works: https://qlever.cs.uni-freiburg.de/wikidata/FZ6sHB .

WolfgangFahl commented 2 years ago

@hannahbast thank you https://qlever.cs.uni-freiburg.de/wikidata/6JhcOt is the query with correct ordinals - i have to fix my query generator :-)

anlam commented 2 years ago

Hey @hannahbast, I have two questions:

  1. For queries that return a lot of results, Qlever seems to have a limit of 100.000 results. Is it possible to configure Qlever to return all of the result?
  2. Is it possible to export the dataset to file?
hannahbast commented 2 years ago

@anlam If you specify an explicit limit, you can download results of any size. You can download the data in the typical formats: CSV, TSV, JSON, and Turtle. For JSON and Turtle you also have to specify the URL parameter send=some_larger_limit. For CSV and TSV you don't need that. You specify the format via the "Accept" header, this is standard for SPARQL. QLever supports chunked transfer, so the download will also work for very large results.

The QLever UI also has a "Share" button. For example, to get the IRIs of the almost 10 million instances of type human (wd:Q5) in Wikidata, you can do

curl -Gs -H "Accept: text/csv" https://qlever.cs.uni-freiburg.de/api/wikidata --data-urlencode "query=PREFIX wd: <http://www.wikidata.org/entity/> PREFIX wdt: <http://www.wikidata.org/prop/direct/> SELECT ?person WHERE { ?person wdt:P31 wd:Q5 } LIMIT 10000000" --data-urlencode "send=10000000"

Reason for the limit: Eventually, QLever will have an "admin token" which allows the owner of the backend to control all kinds of settings in including: download limit, query timeouts, maximum RAM for query processing, etc. For now, some of these limits are hard-coded. But as explained above, the result size limit can be worked around easily.

AndreaWesterinen commented 2 years ago

For those that are interested, I have automated the W3C RDF tests. Documentation is at https://wikitech.wikimedia.org/wiki/User:AndreaWest/WDQS_Testing/Running_TFT.

anlam commented 2 years ago

Hey @hannahbast,

Thank you so much for the answer. I was wondering if there is any explicit way to export the whole dataset instead of executing the query via SPARQL endpoint?

hannahbast commented 2 years ago

@anlam If you want the whole dataset, isn't the right thing to download it from Wikidata: https://dumps.wikimedia.org/wikidatawiki/entities/ ?

In principle, you can of course download the whole dataset with CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }, and it works, but I don't think that's a proper use of a SPARQL endpoint

tuukka commented 2 years ago

SELECT (?queryVariable AS ?resultVariable) does not give an error but produces an empty variable.

hannahbast commented 2 years ago

@tuukka Can you provide a complete query?

tuukka commented 2 years ago

SELECT (?queryVariable AS ?resultVariable) does not give an error but produces an empty variable.

Example (should produce 192 nation states, produces 192 empty rows): https://qlever.cs.uni-freiburg.de/wikidata/qphky4

Of course, it's easy to work around by search-and-replace of ?queryVariable with ?resultVariable, but it's a reason why some queries from WDQS don't work directly in QLever.

hannahbast commented 2 years ago

Thanks, that looks like one of the many small deviations from the SPARQL 1.1 standard, which we are currently fixing.

@Qup42 @joka921 What do you think?

As a quick workaround, BIND is your friend and this equivalent query gives you the desired result: https://qlever.cs.uni-freiburg.de/wikidata/HOuTUH

WolfgangFahl commented 2 years ago

Here is an example of a "truly tabular" generated query with GROUP_CONCAT that does not seem to work:

# truly tabular aggregate query for 
# Q2020153:academic conference
# generated by trulytabular.py version 0.2.14 on 2022-07-28T08:27:46.370180
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT ?academic_conference ?academic_conferenceLabel
  (SAMPLE (?country) AS ?country_sample)
  (COUNT (?short_name) AS ?short_name_count)
  (GROUP_CONCAT (?short_name;SEPARATOR="⇹") AS ?short_name_list)
  (COUNT (?title) AS ?title_count)
  (GROUP_CONCAT (?title;SEPARATOR="⇹") AS ?title_list)
WHERE {
  # instanceof Q2020153:academic conference
  ?academic_conference wdt:P31 wd:Q2020153.
  # label
  ?academic_conference rdfs:label ?academic_conferenceLabel.  
  FILTER (LANG(?academic_conferenceLabel) = "en").
  # country (P17)
  OPTIONAL { 
    ?academic_conference wdt:P17 ?country. 
  }
  # short name (P1813)
  OPTIONAL { 
    ?academic_conference wdt:P1813 ?short_name. 
  }
  # title (P1476)
  OPTIONAL { 
    ?academic_conference wdt:P1476 ?title. 
  }
}
GROUP BY ?academic_conference ?academic_conferenceLabel
HAVING (COUNT(?country)=1)
WolfgangFahl commented 2 years ago

https://qlever.cs.uni-freiburg.de/wikidata/l1x8od works

joka921 commented 2 years ago

@WolfgangFahl
Here is a variant of you query that works

There are two issues here: QLever's parser currently does not allow ⇹ as a separator. I'll have to check, whether this has to be somehow escaped, or whether this is a bug in our parser implementation. I have therefore replaced the separator by `. The second issue is thatHAVING(COUNT(...` is currently not supported (QLever's Filter implementation is rather incomplete and does only allow a small set of expressions, but I am actively working on that. The workaround here is to explicitly bind the COUNT to a Variable in the SELECT clause and then filter this variable for equality with 1. This currently has the disadvantage that this "internal" variable is then selected in the result, but we are aware of this issue.

WolfgangFahl commented 1 year ago

Is there a way to emulate EXISTS and BOUND which both seem to be unimplemented?

EDIT: I found !ASK but cannot use it in a FILTER: Exception: ParseException, cause: ! is not a valid left hand side for a filter. EDIT2: boolean literals don't seem to be implemented as false = ASK ... results in Unexpected input: false = ASK ... EDIT3: COALESCE also unimplemented.

see https://qlever.cs.uni-freiburg.de/wikidata/zitfWe for coalesce not working

aindlq commented 8 months ago

Not supported: ASK queries are currently not supported by QLever.

On can use LIMIT 1 query as a workaround: SELECT * WHERE {...} LIMIT 1

LorenzBuehmann commented 7 months ago

Is there an overview about the current coverage of SPARQL 1.1 features resp. functions. Or maybe just a list of the missing features/functions? That would be more convenient than checking QLever demo or own setups by executing queries I think.

From what I can can tell since I tried it in the morning, hash functions are missing. But I'm looking for a list according the SPARQL 1.1 specs, e.g.

7.4.1 Functional Forms

7.4.2 Functions on RDF Terms

Functions on Strings

7.4.4 Functions on Numerics

7.4.5 Functions on Dates and Times

7.4.6 Hash Functions

WolfgangFahl commented 7 months ago

see #1247 and #859 and https://github.com/WDscholia/scholia/issues/2412 how a systematic testing approach could look like. Is there somewhere a list of example queries according to the SPARQL spec that we could pick up?

LorenzBuehmann commented 7 months ago

There is the SPARQL 1.1 test suite: https://www.w3.org/2009/sparql/docs/tests/summary.html The test structure is described here: https://www.w3.org/2009/sparql/docs/tests/README.html

It provides data, query and expected results w.r.t. the standard.

LorenzBuehmann commented 7 months ago

I saw a query in the QLever "OHM Planet" demo:

SELECT ?p ?count ?percent WHERE {
  { SELECT ?p (COUNT(?p) AS ?count) WHERE { ?s ?p ?o } GROUP BY ?p }
  BIND(100 * ?count / SUM(?count) AS ?percent)
}
ORDER BY DESC(?count)

I'm pretty sure this is non-standard SPARQL as aggregate functions are not allowed in a BIND expression. Is this intended and supposed to be a feature of QLever then?

With standard SPARQL it would be a more verbose query like

SELECT ?p ?count_p ?percent WHERE {
  { SELECT (COUNT(?s) AS ?count_total) WHERE {?s ?p ?o}}
  { SELECT ?p (COUNT(?p) AS ?count_p) WHERE { ?s ?p ?o } GROUP BY ?p }
  BIND(100 * ?count_p / ?count_total AS ?percent)
}
ORDER BY DESC(?count_p)