panic on model loading in edgen_rt_llama_cpp

edgenai / edgen

⚡ Edgen: Local, private GenAI server alternative to OpenAI. No GPU required. Run AI models locally: LLMs (Llama2, Mistral, Mixtral...), Speech-to-text (whisper) and many others.

https://docs.edgen.co/

Apache License 2.0

328 stars 15 forks source link

panic on model loading in edgen_rt_llama_cpp #57

Open toschoo opened 7 months ago

toschoo commented 7 months ago

Description

This line, very sporadically, causes a panic.

Solution

I guess that the problem is the lazy implementation of UnloadingModel but I couldn't prove it yet. The bug is simply too rare. If this is the problem, however, retry should solve the issue.

Remark

The code in Whisper is similar and, whatever solution is found for LLM, it should also be applied there.

toschoo commented 6 months ago

I tried for hour but I cannot reproduce the bug.