Question about the router loss of MixLoRA

TUDB-Labs / MoE-PEFT

An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT

Apache License 2.0

21 stars 4 forks source link

Hi,

I am currently implementing a custom MixLoRA model based on your code and had a question regarding the router loss.

From what I understand, the router loss is calculated for each layer that employs MixLoRA. Could you confirm if the final router loss for the entire model is the sum of the router losses computed at each MixLoRA-enabled layer?

https://github.com/TUDB-Labs/MoE-PEFT/blob/50984e12202b899926bd469a1deeab155e534018/moe_peft/model.py#L505-L507

Thank you in advance for your clarification!

TUDB-Labs / MoE-PEFT

Question about the router loss of MixLoRA #6