Mantella is a Skyrim and Fallout 4 mod which allows you to naturally speak to NPCs using Whisper (speech-to-text), LLMs (text generation), and xVASynth / XTTS (text-to-speech).
Unfortunately koboldcpp with cuda crashes on my pc because my processor doesn't support avx2, while the other "blas" are too slow. So as an alternative i use llamafile, is working nice and smart, is very light and very performing on my 3060 with 12gb. The only problem is that every time I have to start a conversation, in order for the llm to generate the response, I have to briefly "alt+tab" to "exit and re-enter the game" so that llamafile generates the response and it triggers the loop with speech, it also works for multiple comments, but then after it asks a new question, I have to "alt+tab" again to trigger the llm. I was wondering what it could be and if there is a way to overcome this problem.
Unfortunately koboldcpp with cuda crashes on my pc because my processor doesn't support avx2, while the other "blas" are too slow. So as an alternative i use llamafile, is working nice and smart, is very light and very performing on my 3060 with 12gb. The only problem is that every time I have to start a conversation, in order for the llm to generate the response, I have to briefly "alt+tab" to "exit and re-enter the game" so that llamafile generates the response and it triggers the loop with speech, it also works for multiple comments, but then after it asks a new question, I have to "alt+tab" again to trigger the llm. I was wondering what it could be and if there is a way to overcome this problem.