Open alphabewitch opened 5 months ago
Because I am using vLLM server to deploy a MoE model. However, this model has a large number of experts and the number of activated experts is very small. So it is very suitable for the expert offloading solution.
Still work in progress, curently it is not supported
Because I am using vLLM server to deploy a MoE model. However, this model has a large number of experts and the number of activated experts is very small. So it is very suitable for the expert offloading solution.