art-from-the-machine / Mantella

Mantella is a Skyrim and Fallout 4 mod which allows you to naturally speak to NPCs using Whisper (speech-to-text), LLMs (text generation), and xVASynth / XTTS (text-to-speech).
https://art-from-the-machine.github.io/Mantella/
GNU General Public License v3.0
153 stars 38 forks source link

llamafile as LLM server. #277

Open amonpaike opened 2 months ago

amonpaike commented 2 months ago

Unfortunately koboldcpp with cuda crashes on my pc because my processor doesn't support avx2, while the other "blas" are too slow. So as an alternative i use llamafile, is working nice and smart, is very light and very performing on my 3060 with 12gb. The only problem is that every time I have to start a conversation, in order for the llm to generate the response, I have to briefly "alt+tab" to "exit and re-enter the game" so that llamafile generates the response and it triggers the loop with speech, it also works for multiple comments, but then after it asks a new question, I have to "alt+tab" again to trigger the llm. I was wondering what it could be and if there is a way to overcome this problem.