This PR bumps transformers to the latest version (with llama3.1 support).
It adds cache_position parameter to Mixtral models, as it was introduced there in a recent transformers update.
The code checks that cache_position is a default one and does not forward it to the server. This is the same behavior we have for Llama models.
This PR bumps transformers to the latest version (with llama3.1 support). It adds cache_position parameter to Mixtral models, as it was introduced there in a recent transformers update.
The code checks that cache_position is a default one and does not forward it to the server. This is the same behavior we have for Llama models.