Closed ghost closed 4 months ago
There is not enough information to possibly assist here. What kind of model did you load? LoRA? Q_? Second, that version of llama-cpp will not bind to your GPU most likely and inference will be slow unless on a unified memory system (M1,M2,M3).
Lastly, we are going to be removing the built-in native runner because of issues like this and the lack of GPU support, we will be reinventing technology that is better handled by other local LLM runners like Ollama, LMStudio, or LocalAI. Whatever your model is, it will likely be easier to run and get inferencing via Ollama and using that connection in AnythingLLM
I have my own fine-tuned model and i have placed in the downloaded folder and can be detected by Native LLM selection. But as i prompt something, it crashes by saying [Failed to load model]. Need help on this.