Are there any plans to support fine-grained experts in the future?
Fine-grained experts is a technique adopted in projects like Qwen MoE and DeepSeek MoE, and has shown promising results. This approach involves partitioning a single FFN into several segments to create multiple experts, allowing for a larger number of experts without increasing the overall parameter count.
Qwen MoEDeepSeek MoE
Are there any plans to support fine-grained experts in the future?
Fine-grained experts is a technique adopted in projects like Qwen MoE and DeepSeek MoE, and has shown promising results. This approach involves partitioning a single FFN into several segments to create multiple experts, allowing for a larger number of experts without increasing the overall parameter count. Qwen MoE DeepSeek MoE