Support for fine-grained experts in MoE models

Are there any plans to support fine-grained experts in the future?

Fine-grained experts is a technique adopted in projects like Qwen MoE and DeepSeek MoE, and has shown promising results. This approach involves partitioning a single FFN into several segments to create multiple experts, allowing for a larger number of experts without increasing the overall parameter count. Qwen MoE DeepSeek MoE

arcee-ai / mergekit

Support for fine-grained experts in MoE models #363