ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
380 stars 45 forks source link

Prefix Compression File does not escape multiline Literals #939

Open Qup42 opened 1 year ago

Qup42 commented 1 year ago

Prefixes containing newlines may be generated by PrefixHeuristic.cpp. These prefixes are written to the prefixes file by IndexImpl.cpp without any escaping. PrefixCompressor.h interprets each line as a prefix. This leads to a missmatch of the prefixes that are generated and read back. This missmatch may trigger the assertion in PrefixCompressor.h:80 which is triggered when more than 126 prefixes are read back.

Small example:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix vvz: <http://vvz.tf.uni-freiburg.de/vvz-schema#> .

<http://vvz.tf.uni-freiburg.de/e7a31f78-3e38-4720-a17e-5d0496893668> rdfs:label """Computational Economics

ACHTUNG: Computational Economics & Übung werden als Blockveranstaltung vom 02.05.23 - 09.05.2023 angeboten. Deswegen wurden nur Räume in der Vorlesung gebucht und nicht in dieser Veranstaltung.""" ;
    vvz:element_number "03LE47Ü-ID126923" .

<http://vvz.tf.uni-freiburg.de/fd7206c3-37b7-4a6b-b0a2-44bdffb821ba> rdfs:label """Computational Economics

ACHTUNG: Computational Economics & Übung werden als Blockveranstaltung vom 02.05.23 - 09.05.2023 angeboten. Deswegen wurden nur Räume in dieser Veranstaltung gebucht.""" ;
    vvz:element_number "03LE47V-ID126919" .

Generated prefixes file:

"Computational Economics

ACHTUNG: Computational Economics & Übung werden als Blockveranstaltung vom 02.05.23 - 09.05.2023 angeboten. Deswegen wurden nur Räume in d
<http://vvz.tf.uni-freiburg.de/
<http://
"03LE47
<
hannahbast commented 1 year ago

Oh wow, great catch! I wonder why this hasn't shown up so far in our hundreds of index builds