abgulati / LARS

An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.
https://www.youtube.com/watch?v=Mam1i86n8sU&ab_channel=AbheekGulati
GNU Affero General Public License v3.0
458 stars 32 forks source link

Failed to start llama.cpp local-server persists after saving settings #19

Closed yvileapsis closed 3 weeks ago

yvileapsis commented 1 month ago

I seem to be stuck at the last stage of installation procedures.

After receiving the "Failed to start llama.cpp local-server ; No exception info." error, I was able to OK it, select an LLM in C://web_app_storage/models/, set its settings and save them. No automatic refresh followed. After manual refresh I received "There was an error when loading the LLM in the method /llama_cpp_server_starter, more details can be viewed in the browser's console." in the browser window, followed with another "Failed to start llama.cpp" in console. Settings set appear to be saved.

I've attempted to resolve the problem by choosing different LLM (ones I attempted were dolphin-2.1-mistral-7b.Q8_0 and tinyllama-2-1b-miniguanaco.Q2_K), as well as changing settings to use or not to use the GPU and and finally attempt to switch to pre-built llama.cpp binaries. None of that appears to have helped.

Executing llama-server doesn't appear to produce any output, which I am not sure is a verification that it is working.

Haven't found any logs other than this one, but willing to provide if necessary. server_log.log

abgulati commented 1 month ago

Hi @yvileapsis ,

Did you add the llama-server directory to path? In a console window from the location where your model GGUF is, try to run llama-server -m dolphin-2.1-mistral-7b.Q8_0.gguf and share the output

Also, considering the many user reports of trouble setting up llama.cpp and the pain around updating llama.cpp to support newer LLMs, I am in the process of migrating to and setting as default another local-LLM server that I've written myself, as that'll work with no setup, require no manual downloading of models and work with models right off the hub so you can run new models as they're released. Check out 'HF-Waitress`: https://github.com/abgulati/hf-waitress

So keep an eye on a major LARS update in the coming days

yvileapsis commented 3 weeks ago

Attempting to run llama-server -m dolphin-2.1-mistral-7b.Q8_0.gguf produced no output, which I then investigated. Apparently I missed not the llama-server directory path, but nvcc directory path in system variables. Fixing that allowed LARS to run as expected. Sorry for the trouble.

I'll be sure to check out your new solution for us non-devops-minded folk when it comes out. :)

Thank you for your work.

abgulati commented 2 weeks ago

Thanks for updating this thread with the resolution! Glad to hear it was resolved 🍻

abgulati commented 2 weeks ago

Hi @yvileapsis

Heads-up: v2.0-beta1 is out!

https://github.com/abgulati/LARS/releases/tag/v2.0-beta1

Make sure to re-install pip install -r requirements.txt when you re-clone and check out the updated Dependencies, Installation and Usage Instructions in the README.

Note containers are not yet updated and will be done so in the following week most likely