ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
427 stars 52 forks source link

Strange ASSERT FAILED (valueIdBegin < valueIdEnd; ... message #768

Closed WolfgangFahl closed 1 year ago

WolfgangFahl commented 2 years ago

https://qlever.cs.uni-freiburg.de/wikidata/iemHST

# Conference Series
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?confSeries ?short_name ?confSeriesLabel ?DBLP_pid ?WikiCFP_pid ?GND_pid
WHERE 
{
  # scientific conference series (Q47258130) 
  ?confSeries wdt:P31 wd:Q47258130.
  OPTIONAL { 
     ?confSeries wdt:P1813 ?short_name . 
  }
  # any item with a DBLP venue ID 
  ?confSeries wdt:P8926 ?DBLP_pid.
  # WikiCFP pid 
  optional {
     ?confSeries wdt:P5127 ?WikiCFP_pid.
  }
  # GND pid
  optional {
    ?confSeries wdt:P227 ?GND_pid.
  }
  # label 
  ?confSeries rdfs:label ?confSeriesLabel.
  filter (lang(?confSeriesLabel) = "en").
  # filter exotic pair entry
  FILTER(?confSeries != wd:Q7395156).
}
ORDER BY (?short_name)

grafik


Error processing query

ASSERT FAILED (valueIdBegin < valueIdEnd; in ../src/engine/../global/ValueIdComparators.h, line 329, function std::vector<std::pair<_FIter, _FIter> > valueIdComparators::getRangesForEqualIds(RandomIt, RandomIt, ValueId, ValueId, valueIdComparators::Comparison) [with RandomIt = ad_utility::IteratorForAccessOperator<detail::IdTableTemplated<1, detail::IdTableViewWrapper<ad_utility::AllocatorWithLimit<ValueId> >, ad_utility::AllocatorWithLimit<ValueId> >, ad_utility::detail::AssignableLambdaImpl<Filter::computeFilterRange<1>(IdTableStatic<1, ad_utility::AllocatorWithLimit<ValueId> >*, size_t, Id, Id, IdTableView<1, ad_utility::AllocatorWithLimit<ValueId> >&, std::shared_ptr<const ResultTable>) const::<lambda(const auto:134&, auto:135)> >, true>])
Your query was:

# Conference Series
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT ?confSeries ?short_name ?confSeriesLabel ?DBLP_pid ?WikiCFP_pid ?GND_pid
WHERE 
{
  # scientific conference series (Q47258130) 
  ?confSeries wdt:P31 wd:Q47258130.
  OPTIONAL { ?confSeries wdt:P1813 ?short_name . 
 }
  # any item with a DBLP venue ID 
  ?confSeries wdt:P8926 ?DBLP_pid.
  # WikiCFP pid 
  optional {
     ?confSeries wdt:P5127 ?WikiCFP_pid.
  }
  # GND pid
  optional {
    ?confSeries wdt:P227 ?GND_pid.
  }
  # label 
  ?confSeries rdfs:label ?confSeriesLabel.
  filter (lang(?confSeriesLabel) = "en").
  # filter exotic pair entry
  FILTER(?confSeries != wd:Q7395156).
}
ORDER BY (?short_name)
hannahbast commented 2 years ago

Here is a more compact (and more readable) form of the query: https://qlever.cs.uni-freiburg.de/wikidata/DbhFlA

I have experimentally build a version of Wikidata over the weekend, where the names of all IRIs that start with <http reside on disk. The QLever instance then uses 20 GB less RAM. I have just reverted to the previous version of Wikidata, where only a more restricted set of IRIs reside on disk (see https://github.com/ad-freiburg/qlever-control/blob/main/Qleverfiles/Qleverfile.wikidata). Then the error does not occur.

My guess is that the ASSERT fails for this particular query because QLever's sort order is currently wrong for items for which the names reside on disk. We are aware of that problem and we also know how to fix it, we just didn't get there yet.

@joka921 Do you agree?

joka921 commented 2 years ago

Yes, The current implementation of the external vocabulary leads to all kinds of errors and oddities, especially when on-disk and in-RAM entries are compared/filtered. Thanks for pointing this out, another reason to tackle this issue soon.

hannahbast commented 1 year ago

The server no longer crashes for this query. The sort order is still wrong when the respective literals reside on disk. But that is a seperate issue and for Wikidata, all en and de literals now reside in memory.