Closed Ce-daros closed 5 months ago
Currently, we do not have plans to expand our ReLU-activated model line-up beyond ReluLLaMA, ReluFalcon, ProSparse, and Bamboo, as model tuning requires significant effort.
That said, while we will not support sparse activation for Qwen1.5, we are developing “hot expert offloading” for the Qwen MoE model based on Qwen1.5, without any further fine-tuning. We plan to roll out this feature and support PowerInfer in MoE scenarios, and you might find the enhanced speed from smarter GPU offloading on this model interesting.
目前,由于训练模型使其转为ReLU激活函数需要投入大量的精力,我们没有计划将我们的稀疏激活模型支持扩展到ReluLLaMA、ReluFalcon、ProSparse和Bamboo之外。
虽然我们不会为Qwen1.5支持稀疏激活,但我们正在为基于Qwen1.5的Qwen MoE模型开发“hot expert offloading”,无需对模型进行进一步训练。我们计划通过专家级别offloading的支持,让PowerInfer能够应用于MoE模型的场景,你可能会对这种情况下的性能提升感兴趣。
Thanks!
感谢回复!
Prerequisites
Before submitting your issue, please ensure the following:
[https://huggingface.co/mightbe/Qwen1.5-32B-llamafied/tree/main]