Fix Mixtral-related issues

bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

https://petals.dev

MIT License

8.89k stars 490 forks source link

Closed artek0chumak closed 2 months ago

artek0chumak commented 2 months ago

This PR fixes problems related to #569:

BS is removed for Mixtral and Llama for now. Those models use DynamicCache, which requires special function to change: (see https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L161)