How to convert a finetuned MOSS model to quantized version model? 请问如何把一个finetune过的MOSS模型转换为量化版的模型呢？

OpenMOSS / MOSS

An open-source tool-augmented conversational language model from Fudan University

https://txsun1997.github.io/blogs/moss.html

Apache License 2.0

11.92k stars 1.14k forks source link

How to convert a finetuned MOSS model to quantized version model? 请问如何把一个finetune过的MOSS模型转换为量化版的模型呢？ #245

Open qgpmztmf opened 1 year ago

qgpmztmf commented 1 year ago

I couldn't find the code to release this process in this repository. Has anyone successfully converted a finetuned MOSS model to its quantized version? If so, could you please share the steps or code used to achieve this? 没找到实现这个过程的代码，有谁成功把finetune过的moss模型转换成量化版本的模型吗？

qgpmztmf commented 1 year ago

JIEKEXIAN commented 1 year ago

我测了他们的int4，发现量化后的还没有量化前的推理速度快。

qgpmztmf commented 1 year ago

我测了他们的int4，发现量化后的还没有量化前的推理速度快。

量化并不一定会提速，量化主要是为了缩小模型占用显存。