Open CCRss opened 1 week ago
Our current MPO codebase is implemented based on HuggingFace's TRL library, which is not very well-suited for supporting large models. We plan to enable training for the 76B model after migrating to a more efficient codebase.
We are going to support liger kernel in the near future, maybe after using liger kernel to save GPU memory, we can use this codebase to train MPO for 76B model.
Motivation
Is it possible to apply Mixed Preference Optimization for 76B internVL. Similar to 8B but for 76B?
Related resources
No response
Additional context
No response