SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
MIT License
7.9k stars 406 forks source link

Any plans to support llamafied Qwen1.5?有支持llama化qwen的计划吗? #181

Closed Ce-daros closed 5 months ago

Ce-daros commented 5 months ago

Prerequisites

Before submitting your issue, please ensure the following:

[https://huggingface.co/mightbe/Qwen1.5-32B-llamafied/tree/main]

hodlen commented 5 months ago

Currently, we do not have plans to expand our ReLU-activated model line-up beyond ReluLLaMA, ReluFalcon, ProSparse, and Bamboo, as model tuning requires significant effort.

That said, while we will not support sparse activation for Qwen1.5, we are developing “hot expert offloading” for the Qwen MoE model based on Qwen1.5, without any further fine-tuning. We plan to roll out this feature and support PowerInfer in MoE scenarios, and you might find the enhanced speed from smarter GPU offloading on this model interesting.


目前,由于训练模型使其转为ReLU激活函数需要投入大量的精力,我们没有计划将我们的稀疏激活模型支持扩展到ReluLLaMA、ReluFalcon、ProSparse和Bamboo之外。

虽然我们不会为Qwen1.5支持稀疏激活,但我们正在为基于Qwen1.5的Qwen MoE模型开发“hot expert offloading”,无需对模型进行进一步训练。我们计划通过专家级别offloading的支持,让PowerInfer能够应用于MoE模型的场景,你可能会对这种情况下的性能提升感兴趣。

Ce-daros commented 5 months ago

Thanks!


感谢回复!