Open zxy1728 opened 1 month ago
For the moment. This is because it is difficult to pass router loss without modifying the transformers
library. If you don't need router loss, the current code should also be able to support training with simple modifications.
Also, the torch
version and transformers
version of m-LoRA should be able to drop a little, not too much impact.
Excuse me, can the training process only be implemented through mlora? But that doesn't match my own torch and transformer versions, is there a solution?