TorchMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.
Apache License 2.0
76 stars 5 forks source link

Can the MoE-Infinity framework be used in conjunction with the vLLM framework? #23

Open alphabewitch opened 2 months ago

alphabewitch commented 2 months ago

Because I am using vLLM server to deploy a MoE model. However, this model has a large number of experts and the number of activated experts is very small. So it is very suitable for the expert offloading solution.

drunkcoding commented 1 month ago

Still work in progress, curently it is not supported