Closed RoggeOhta closed 1 month ago
It seems that qdrant did not start properly. Can you paste the contents from the following two files?
gaianet/log/init-qdrant.log
and
gaianet/log/start-qdrant.log
I ran it again. this time error is different.
❯ gaianet init
[+] Checking the config.json file ...
[+] Downloading Phi-3-mini-4k-instruct-Q5_K_M.gguf ...
################################################################################################################# 100.0%################################################################################################################# 100.0%
* Phi-3-mini-4k-instruct-Q5_K_M.gguf is downloaded in /home/rogge/gaianet
[+] Downloading all-MiniLM-L6-v2-ggml-model-f16.gguf ...
################################################################################################################# 100.0%################################################################################################################# 100.0%
* all-MiniLM-L6-v2-ggml-model-f16.gguf is downloaded in /home/rogge/gaianet
[+] Creating 'default' collection in the Qdrant instance ...
* Start a Qdrant instance ...
* Remove the existed 'default' Qdrant collection ...
* Download Qdrant collection snapshot ...
################################################################################################################# 100.0%################################################################################################################# 100.0%
The snapshot is downloaded in /home/rogge/gaianet
* Import the Qdrant collection snapshot ...
The process may take a few minutes. Please wait ...
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Tokio task join error: task 1242 panicked"},"time":0.697784244}
* [Error] Failed to recover from the collection snapshot. {"status":{"error":"Service internal error: Tokio task join error: task 1242 panicked"},"time":0.697784244}
and here is the init-qdrant.log
. no start-qdrant.log
init-qdrant.log
As this line indicates, Qdrant ran out of memory during the import. How much RAM do you have on the WSL system? Thanks.
2024-05-20T07:24:52.900895Z ERROR qdrant::startup: Panic occurred in file /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cgroups-rs-0.3.4/src/memory.rs at line 587: called `Result::unwrap()` on an `Err` value: Error { kind: ReadFailed("/sys/fs/cgroup/memory.high"), cause: Some(Os { code: 2, kind: NotFound, message: "No such file or directory" }) }
As this line indicates, Qdrant ran out of memory during the import. How much RAM do you have on the WSL system? Thanks.
2024-05-20T07:24:52.900895Z ERROR qdrant::startup: Panic occurred in file /home/runner/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cgroups-rs-0.3.4/src/memory.rs at line 587: called `Result::unwrap()` on an `Err` value: Error { kind: ReadFailed("/sys/fs/cgroup/memory.high"), cause: Some(Os { code: 2, kind: NotFound, message: "No such file or directory" }) }
I'm running WSL on a 16G physical memory machine, with WSL memory as below
❯ free -mh
total used free shared buff/cache available
Mem: 7.6Gi 679Mi 3.5Gi 73Mi 3.4Gi 6.6Gi
Swap: 10Gi 4.0Mi 9Gi
I try adding 10G swap, still the same error
And I checked path /sys/fs/cgroup
, there is no memory.high
/sys/fs/cgroup🔒
❯ ll
total 0
-r--r--r-- 1 root root 0 May 20 15:50 cgroup.controllers
-rw-r--r-- 1 root root 0 May 20 15:50 cgroup.max.depth
-rw-r--r-- 1 root root 0 May 20 15:50 cgroup.max.descendants
-rw-r--r-- 1 root root 0 May 20 15:50 cgroup.procs
-r--r--r-- 1 root root 0 May 20 15:50 cgroup.stat
-rw-r--r-- 1 root root 0 May 20 15:50 cgroup.subtree_control
-rw-r--r-- 1 root root 0 May 20 15:50 cgroup.threads
-r--r--r-- 1 root root 0 May 20 15:50 cpuset.cpus.effective
-r--r--r-- 1 root root 0 May 20 15:50 cpuset.mems.effective
-r--r--r-- 1 root root 0 May 20 15:50 cpu.stat
drwxr-xr-x 2 root root 0 May 20 15:50 init.scope
-r--r--r-- 1 root root 0 May 20 15:50 io.stat
--w------- 1 root root 0 May 20 15:50 memory.reclaim
-r--r--r-- 1 root root 0 May 20 15:50 memory.stat
-r--r--r-- 1 root root 0 May 20 15:50 misc.capacity
drwxr-xr-x 43 root root 0 May 20 15:56 system.slice
drwxr-xr-x 3 root root 0 May 20 15:51 user.slice
I believe this issue from cgroup-rs might hint the reason for panic. https://github.com/kata-containers/cgroups-rs/issues/115
The author of library says:
It is because the api only supports cgroup v1, while the systems are in v2.
I found the full solution and reason for this problem.
The reason:
Since cgroup-rs get_max_value
API only supports cgroupv1, so in an only cgroupv2 environment this API is going to panic.
By default, WSL2 will use both cgroupv1 & v2, but I used an experimental feature autoMemoryReclaim
, this will automatically disable cgroupv1, leaving only v2, So it will cause the problem.
Solution:
Consider support V2 API, and remind WSL user to disable autoMemoryReclaim
feature.
Cool! Since Qdrant is upstream from us, I guess we have to do the second option. Can you send a screenshot that shows where this option is turned off? Thanks!
Cool! Since Qdrant is upstream from us, I guess we have to do the second option. Can you send a screenshot that shows where this option is turned off? Thanks!
Of course. This feature is off by default. In case others accidentally turn it on and don't know how to switch it off.
Step to turn on/off this feature:
autoMemoryReclaim
in [experimental] section.This is great. Thank you! I updated the docs and linked to your profile for acknowledgment.
Please keep us updated about your progress!
OS : Ubuntu 20.04 in WSL log:
It may seems it is a network issue?