dvmazur / mixtral-offloading

Run Mixtral-8x7B models in Colab or consumer desktops
MIT License
2.29k stars 225 forks source link
colab-notebook deep-learning google-colab language-model llm mixture-of-experts offloading pytorch quantization

Mixtral offloading

This project implements efficient inference of Mixtral-8x7B models.

How does it work?

In summary, we achieve efficient inference of Mixtral-8x7B models through a combination of techniques:

For more detailed information about our methods and results, please refer to our tech-report.

Running

To try this demo, please use the demo notebook: ./notebooks/demo.ipynb or Open In Colab

For now, there is no command-line script available for running the model locally. However, you can create one using the demo notebook as a reference. That being said, contributions are welcome!

Work in progress

Some techniques described in our technical report are not yet available in this repo. However, we are actively working on adding support for them in the near future.

Some of the upcoming features are: