ad-freiburg / qlever-control

Apache License 2.0
29 stars 13 forks source link

Indexing not working #73

Open daelba opened 1 month ago

daelba commented 1 month ago

I have similar issue to #41. meta-data.json is not created.

Ubuntu 24.04.1 LTS

pip show qlever:

Name: qlever
Version: 0.5.6
Summary: Script for using the QLever SPARQL engine.
Home-page: 
Author: 
Author-email: Hannah Bast <bast@cs.uni-freiburg.de>
License: Apache-2.0
Location: /usr/local/lib/python3.12/dist-packages
Requires: argcomplete, psutil, termcolor
Required-by: 

docker --version:

Docker version 24.0.7, build 24.0.7-0ubuntu4.1

Qleverfile:

[data]
NAME              = scrap
BASE_URL          = .
GET_DATA_CMD      = echo "Using local file"
DESCRIPTION       = Scrap Dataset
TEXT_DESCRIPTION  = All literals, search with FILTER CONTAINS(?var, "...")

[index]
INPUT_FILES     = data/Q31519.ttl
CAT_INPUT_FILES = cat ${INPUT_FILES}
SETTINGS_JSON   = { "ascii-prefixes-only": false, "num-triples-per-batch": 10000 }
STXXL_MEMORX    = 10G

[server]
PORT               = 7019
ACCESS_TOKEN       = ${data:NAME}_xxxx_secret_xxxx
MEMORY_FOR_QUERIES = 5G
CACHE_MAX_SIZE     = 2G
TIMEOUT            = 30s

[runtime]
SYSTEM = docker
IMAGE  = docker.io/adfreiburg/qlever:latest

[ui]
UI_CONFIG = scrap

Where Q31519.ttl was downloaded from https://www.wikidata.org/wiki/Special:EntityData/Q31519.ttl. It is small file with few triples, so there can't be any memory issue.

qlever index:

Command: index

echo '{ "ascii-prefixes-only": false, "num-triples-per-batch": 10000 }' > scrap.settings.json
docker run --rm -u $(id -u):$(id -g) -v /etc/localtime:/etc/localtime:ro -v $(pwd):/index -w /index --init --entrypoint bash --name qlever.index.scrap docker.io/adfreiburg/qlever:latest -c 'cat data/Q31519.ttl | IndexBuilderMain -F ttl - -i scrap -s scrap.settings.json --stxxl-memory 5G | tee scrap.index-log.txt'

2024-10-07 11:19:16.178 - INFO: QLever IndexBuilder, compiled on Fri Oct  4 20:23:39 UTC 2024 using git hash 77ea2c
2024-10-07 11:19:16.178 - INFO: You specified the input format: TTL
2024-10-07 11:19:16.179 - INFO: Processing input triples from /dev/stdin ...
2024-10-07 11:19:16.179 - INFO: Locale was not specified in settings file, default is en_US
2024-10-07 11:19:16.181 - INFO: You specified "locale = en_US" and "ignore-punctuation = 0"
2024-10-07 11:19:16.181 - INFO: You specified "parallel-parsing = true", which enables faster parsing for TTL files with a well-behaved use of newlines
2024-10-07 11:19:16.181 - INFO: You specified "num-triples-per-batch = 10,000", choose a lower value if the index builder runs out of memory
2024-10-07 11:19:16.181 - INFO: By default, integers that cannot be represented by QLever will throw an exception
2024-10-07 11:19:16.187 - INFO: Parsing input triples and creating partial vocabularies, one per batch ...
2024-10-07 11:19:16.266 - INFO: Triples parsed: 11,217 [average speed 0.1 M/s] 
2024-10-07 11:19:16.273 - INFO: Number of triples created (including QLever-internal ones): 19,793 [may contain duplicates]
2024-10-07 11:19:16.273 - INFO: Merging partial vocabularies ...
2024-10-07 11:19:16.278 - INFO: Words merged: 5,398 [average speed 1.5 M/s]

dir:

data #folder
Qleverfile
scrap.index-log.txt
scrap.settings.json
scrap.tmp.partial-ids-mmap.0
scrap.tmp.partial-ids-mmap.1
scrap.tmp.partial-vocabulary.0
scrap.tmp.partial-vocabulary.1
scrap.unsorted-triples.dat
scrap.vocabulary.words.external
scrap.vocabulary.words.external.offsets
scrap.vocabulary.words.internal
scrap.vocabulary.words.internal.ids

Is there any problem in the Qleverfile, or is it qlever bug? Thanks.

hannahbast commented 1 month ago

@daelba Sorry for the late reply. That was a bug introduced by a recent commit. Fixed now by 466df71d110135925439afc26da1b27d931c823a . Please reinstall using pip install --upgrade qlever or git pull if you have have cloned https://github.com/ad-freiburg/qlever-control

daelba commented 1 month ago

Thanks for your reply. Now I have:

$ pip show qlever

Name: qlever
Version: 0.5.8
Summary: Script for using the QLever SPARQL engine.
Home-page: 
Author: 
Author-email: Hannah Bast <bast@cs.uni-freiburg.de>
License: Apache-2.0
Location: /usr/local/lib/python3.12/dist-packages
Requires: argcomplete, psutil, termcolor
Required-by: 

But the result is still the same.