TUDB-Labs / MixLoRA

State-of-the-art Parameter-Efficient MoE Fine-tuning Method
Apache License 2.0
61 stars 8 forks source link

[help] Can the training process only be implemented through m-LoRA? #11

Open zxy1728 opened 1 month ago

zxy1728 commented 1 month ago

Excuse me, can the training process only be implemented through mlora? But that doesn't match my own torch and transformer versions, is there a solution?

mikecovlee commented 1 month ago

For the moment. This is because it is difficult to pass router loss without modifying the transformers library. If you don't need router loss, the current code should also be able to support training with simple modifications.

mikecovlee commented 1 month ago

Also, the torch version and transformers version of m-LoRA should be able to drop a little, not too much impact.