ad-freiburg / qlever-control

Apache License 2.0
30 stars 13 forks source link

when I use the "qlever index" command, the following error occurs. How can I resolve this issue? #64

Open biegekekeke opened 3 months ago

biegekekeke commented 3 months ago

2024-08-28 14:47:35.804 - INFO: QLever IndexBuilder, compiled on Tue Aug 27 16:54:45 UTC 2024 using git hash ac257c 2024-08-28 14:47:35.804 - INFO: You specified the input format: TTL 2024-08-28 14:47:35.804 - INFO: Processing input triples from /dev/stdin ... 2024-08-28 14:47:35.804 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1" 2024-08-28 14:47:35.805 - INFO: You specified "parallel-parsing = true", which enables faster parsing for TTL files with a well-behaved use of newlines 2024-08-28 14:47:35.805 - INFO: You specified "num-triples-per-batch = 10,000,000", choose a lower value if the index builder runs out of memory 2024-08-28 14:47:35.805 - INFO: By default, integers that cannot be represented by QLever will throw an exception 2024-08-28 14:47:35.808 - ERROR: Operation not permitted

joka921 commented 3 months ago

Hi @biegekekeke and sorry for the spam post above, these keep appearing since yesterday or so... Can you please send us some context:

biegekekeke commented 3 months ago

Hi @biegekekeke and sorry for the spam post above, these keep appearing since yesterday or so... Can you please send us some context: >

  • What type of system is this (X86 or ARM, which operating system in which version)
  • how do you run QLever (natively compiled, via docker, via the qlever control script, from the command line) [EDIT: It seems you are running the control script via qlever index, do you have any custom settings in your QLeverfile, and what is the dataset?
  • What are the permissions on the folder you are

Thank you for your response.I am using the Ubuntu 20.04 operating system. For QLever, I installed it by running pip install qlever and then used the qlever index command. The QLeverfile is as follows:

[data] NAME = wikidata GET_DATA_URL = https://dumps.wikimedia.org/wikidatawiki/entities GET_DATA_CMD = curl -LO -C - ${GET_DATA_URL}/latest-truthy.nt.bz2 ${GET_DATA_URL}/latest-lexemes.nt.bz2 INDEX_DESCRIPTION = "Full Wikidata dump from ${GET_DATA_URL} (latest-truthy.nt.bz2 and latest-lexemes.nt.bz2)"

[index] INPUT_FILES = wikidata-20231222-lexemes.nt.bz2 wikidata-20231222-truthy.nt.bz2 CAT_INPUT_FILES = bzcat ${INPUT_FILES} SETTINGS_JSON = { "languages-internal": ["en"], "prefixes-external": [ "<http://www.wikidata.org/entity/statement", "<http://www.wikidata.org/value", "<http://www.wikidata.org/reference" ], "locale": { "language": "en", "country": "US", "ignore-punctuation": true }, "ascii-prefixes-only": false, "num-triples-per-batch": 10000000 } WITH_TEXT_INDEX = false STXXL_MEMORY = 10g

[server] PORT = 7001 ACCESS_TOKEN = ${data:NAME}_832649627 MEMORY_FOR_QUERIES = 100G CACHE_MAX_SIZE = 100G

[runtime] SYSTEM = docker IMAGE = docker.io/adfreiburg/qlever:latest

[ui] PORT = 7000 CONFIG = wikidata The dataset is based on Wikidata, and my folder permissions are already set to rwx.

joka921 commented 3 months ago
  1. Do smaller datasets, where you just use the provide qleverfile without changes (best try olympics in a separate folder) work for you, or do they show the same issues?
  2. Can you run ls -al in the directory where you run the qlever index command and post the output?
joka921 commented 3 months ago

Another idea: What is your file system (a "normal" ext4 disk, some fancy network mount or something, or something completely else)? How did you install docker, or do you have any special configurations for docker (security hardenings etc) that might lead to permission problems inside the container?

And can you also post the output of qlever index --show (this logs what qlever index does under the hood.

biegekekeke commented 3 months ago

qlever index --show

When I use olympics, the same issue also occurs. These are the permissions of the directory where I run the command:

drwxrwxrwx  2 xxx xxx        4096 Aug  28 12:54 .
drwxrwxrwx 11 xxx xxx        4096 Aug  28 15:34 ..
-rwxrwxrwx  1 xxx xxx        1422 Aug  28 15:44 Qleverfile
-rwxrwxrwx  1 xxx xxx          38 Aug  28 15:14 .stxxl
-rwxrwxrwx  1 xxx xxx   867124662 Aug  28 10:26 wikidata-20231222-lexemes.nt.bz2
-rwxrwxrwx  1 xxx xxx 41030599679 Aug  28 10:29 wikidata-20231222-truthy.nt.bz2
-rwxrwxrwx  1 xxx xxx         830 Aug  28 15:14 wikidata.index-log.txt
-rwxrwxrwx  1 xxx xxx         317 Aug  28 15:14 wikidata.settings.json

The file system I am using is a "normal" ext4 disk. When I run qlever index --show, the output is as follows:


qlever index --show

Command: index

echo '{ "languages-internal": ["en"], "prefixes-external": [ "<http://www.wikidata.org/entity/statement", "<http://www.wikidata.org/value", "<http://www.wikidata.org/reference" ], "locale": { "language": "en", "country": "US", "ignore-punctuation": true }, "ascii-prefixes-only": false, "num-triples-per-batch": 10000000 }' > wikidata.settings.json
docker run --rm -u $(id -u):$(id -g) -v /etc/localtime:/etc/localtime:ro -v $(pwd):/index -w /index --init --entrypoint bash --name qlever.index.wikidata docker.io/adfreiburg/qlever:latest -c 'ulimit -Sn 1048576; bzcat wikidata-20231222-lexemes.nt.bz2 wikidata-20231222-truthy.nt.bz2 | IndexBuilderMain -F ttl -f - -i wikidata -s wikidata.settings.json --stxxl-memory 10g | tee wikidata.index-log.txt'

You called "qlever ... --show", therefore the command is only shown, but not executed (omit the "--show" to execute it)
hannahbast commented 3 months ago

@biegekekeke Are you using Docker inside of WSL (Windows Subsystem Linux)?

biegekekeke commented 3 months ago

@biegekekeke您在 WSL(Windows 子系统 Linux)中使用 Docker 吗?

No

joka921 commented 3 months ago

Okay, this sounds like some debuggin inside the Docker is required to track the concrete issue. I won't have time for this in the coming week, but after this we can tackle this. Are you proficient with using GDB/Docker etc.? In this case you could try debugging the call to IndexBuilderMain inside the Docker and send me a backtrace of the location where the error occurs, but this requires some particular computer science background.