Can the MoE-Infinity framework be used in conjunction with the vLLM framework?

TorchMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.

Apache License 2.0

101 stars 8 forks source link

Can the MoE-Infinity framework be used in conjunction with the vLLM framework? #23

Open alphabewitch opened 5 months ago

alphabewitch commented 5 months ago

Because I am using vLLM server to deploy a MoE model. However, this model has a large number of experts and the number of activated experts is very small. So it is very suitable for the expert offloading solution.

drunkcoding commented 5 months ago

Still work in progress, curently it is not supported