bigscience-workshop / petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://petals.dev
MIT License
9.11k stars 512 forks source link

Bump transformers to 4.43.1 #596

Closed justheuristic closed 2 months ago

justheuristic commented 2 months ago

This PR bumps transformers to the latest version (with llama3.1 support). It adds cache_position parameter to Mixtral models, as it was introduced there in a recent transformers update.

The code checks that cache_position is a default one and does not forward it to the server. This is the same behavior we have for Llama models.