Open win10ogod opened 7 months ago
Hello, can you add the deepseek-moe component? https://arxiv.org/abs/2401.06066 I want to train Mixture-of-Expert RWKV as well as Mixture-of-Expert gptalpha.
I'm currently experimenting with MoE variations, and will add support once I'm further along on that.
Hello, can you add the deepseek-moe component? https://arxiv.org/abs/2401.06066 I want to train Mixture-of-Expert RWKV as well as Mixture-of-Expert gptalpha.