TorchMoE / MoE-Infinity

PyTorch library for cost-effective, fast and easy serving of MoE models.
Apache License 2.0
107 stars 8 forks source link

Support Constrained Server Memory #5

Open drunkcoding opened 9 months ago

drunkcoding commented 9 months ago

Colab server T4 has 12GB DRAM, 16GB GPU, quantized mixtral has 26GB in size with single checkpoint, cannot bot be loaded into memory on creating the custom format for offloading