-
### Is this your first time submitting a feature request?
- [X] I have searched the existing issues, and I could not find an existing issue for this feature
- [X] I am requesting a straightforward…
-
### Bug Report
Just compiled the updated Python bindings V2.7.0
When terminating my GUI now the whole model needs to be loaded again which may take a long time.
In previous versions only the firs…
-
[Groq](https://groq.com) provides an [OpenAI compatible API](https://console.groq.com/docs/openai) to several LLMs e.g. LLaMA3 8b, LLaMA3 70b, Mixtral 8x7b, Gemma 7b (documented on the [models page](h…
-
Looking forward to supporting Mixtral_8x7b MoE
-
**NOTE: ~~It~~ Mixtral can at times be... Fragile. Let's call it that. Keep the temperature *LOW*. You can indeed drive it nuts, at least with the system prompt I was using.**
I intend to make a fo…
-
**LocalAI version:**
2.5.1-cublas-cuda12
**Environment, CPU architecture, OS, and Version:**
Ubuntu 22.04 with 2 RTX A5000 24Gb GPUs
**Describe my bug**
My problem is that this model mixt…
-
Since there are 4 experts adaptors using lora SFT in the paper, the next question is why not try MOE like Mixtral-8X7B?
-
**Problem**
Jan is great, but I'm limited o the number of models I can run on my 16GB GPU. I saw there is a project called [mixtral-offloading](https://github.com/dvmazur/mixtral-offloading) that cou…
-
### System Info
transformers.version=4.42.4
### Who can help?
@Gante
### Information
- [ ] The official example scripts
- [X] My own modified scripts
### Tasks
- [ ] An officially supported tas…
-
I am using the latest vllm docker image, trying to run Mixtral 8x7b model quantized in AWQ format. I got error message as below:
```
INFO 12-24 09:22:55 llm_engine.py:73] Initializing an LLM engine …