Cinnamon / kotaemon

An open-source RAG-based tool for chatting with your documents.
https://cinnamon.github.io/kotaemon/
Apache License 2.0
13.21k stars 987 forks source link

[BUG] - llama-cpp will not load local models #203

Open ajweber opened 4 weeks ago

ajweber commented 4 weeks ago

Description

Have tried a number of huggingface models and consistently get the error message: llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291

This appears to be an old bug that was fixed months ago in llama-cpp. Is it possible your run_linux script is installing an older version of llama-cpp (and/or its python server)?

Reproduction steps

Change env var for LOCAL_MODEL to a downloaded llama-3.1-8B...gguf model from huggingface.
execute run_linux.sh

Screenshots

No response

Logs

No response

Browsers

Other

OS

Linux

Additional information

No response

ajweber commented 3 weeks ago

Still can not test due to other bug logged (startup issue)...HOWEVER, if I change the script to d/l and install llama_cpp_python v0.2.90, it appears to load the local model and that starts correctly.

(End of output is INFO: Started server process[0000] INFO Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:31415 (Press CTRL+C to quit)