ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
424 stars 52 forks source link

Errors parsing STRING value in NT format #1594

Open Dyfeomorfizm opened 3 weeks ago

Dyfeomorfizm commented 3 weeks ago

Hello,

Im trying to index my NT data and get following error

ERROR: Parse error at byte position 22324: Parse error at byte position 22324: Value STRING could not be parsed as a floating point value

Is QLEVER unable to parse STRINGs? Or is there any parameter I should add to qleverfile to parse it?

That log file also have such error

ERROR: Could not parse 10,000 Within 1,048,576MB of Turtle input 2024-10-24 12:34:54.523 - ERROR: If you really have Turtle input with such a long structure please recompile with adjusted constants in ConstantsIndexCreation.h or decompress your file and use --file-format mmap

Even though It's a small nt file 1.4GB in size.

I run it on GKE using qlever-control and qlever index command.

Command: index echo '{ "ascii-prefixes-only": false, "num-triples-per-batch": 1000 , "parallel-parsing" : false}' > GDS.settings.json podman run --rm -u root -v /etc/localtime:/etc/localtime:ro -v $(pwd):/index -w /index --init --entrypoint bash --name qlever.index.GDS docker.io/adfreiburg/qlever:latest -c 'cat one.nt | IndexBuilderMain -F nt -f - -i GDS -s GDS.settings.json --stxxl-memory 1000G | tee GDS.index-log.txt'

joka921 commented 3 weeks ago

Hi, Was this the actual error message, or is STRING something you substituted to not disclose the contents of your RDF data? The message indicates that you probably had the literal "STRING"^^xsd:float" in your datatset (the datatype might also be xsd:double or xsd:decimal for this message to occur. QLever knows that this datatype represents a number and tries to parse it as such. If this fails, a hard error is thrown. The easiest way is to fix your dataset. Another possibility (we use this for other datatypes) issome fallback mechansims that for example ignores the datatype for such invalid literals.

Dyfeomorfizm commented 3 weeks ago

It has been this one

"STRING"^^http://www.w3.org/2001/XMLSchema#double