b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
MIT License
1.45k stars 99 forks source link

Mixtral-8x7B don't work #132

Open MichaelFomenko opened 1 week ago

MichaelFomenko commented 1 week ago

Mixtral-8x7B-Instruct-v0.1 don't work, when I load the model in chat mode, it loads the model but not complete and breaks.

Maybe in Hugingface they changes the Modelle or something else.

https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1/tree/main

b4rtaz commented 1 week ago

Could you share the error?