InternLM-XComposer2-VL-7B使用lora微调后，如何量化得到int4版的模型用于推理？

InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Apache License 2.0

2.47k stars 153 forks source link

InternLM-XComposer2-VL-7B使用lora微调后，如何量化得到int4版的模型用于推理？ #208

Closed iFe1er closed 6 months ago

iFe1er commented 6 months ago

看了auto-gptq文档，这里的模型量化需要数据来进行模型的量化训练，请问具体需要使用哪些数据，是纯文本还是图像+文本混合，有具体的方法和代码实现吗？谢谢！

iFe1er commented 6 months ago

@yhcao6 @panzhang0212 求助～谢谢

LightDXY commented 6 months ago

hi，我们使用的auto-gptq默认的量化方法，没有引入量化训练，https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#quick-tour

iFe1er commented 6 months ago

可是在https://github.com/AutoGPTQ/AutoGPTQ?tab=readme-ov-file#quick-tour中，AutoGPTQ的文档说到： warning: this is just a showcase of the usage of basic apis in AutoGPTQ, which uses only one sample to quantize a much small model, quality of quantized model using such little samples may not good.

不使用数据/极少数据进行量化训练可能会使模型量化的质量变差 @LightDXY