Switch model to Llama 3

Quite sure you are right, the reason will most probably be the different prompt formats ("chat templates") the models use. This project is a bit outdated and therefore still uses the raw template format.

Meanwhile most of this can be abstracted away. Today most LLM providers offer servers with chat endpoints, like these:

lmstudio: http://localhost:1234/v1/chat/completions
openai: https://api.openai.com/v1/chat/completions
anthropic: https://api.anthropic.com/v1/messages
ollama: http://localhost:11434/api/chat

Or you can directly use huggingface transformers.pipeline, which abstracts the model chat template away.

If you want to use the chat endpoints, the LocalEmotionalAIVoiceChat project has example code that does it this way.

Alternatively you can also use then python OpenAI library to wrap the base endpoints (http://localhost:1234/v1). For example code, Linguflex project does some LLM requests in this way.

KoljaB / LocalAIVoiceChat

Switch model to Llama 3 #17