low-resource inference reduces the inference speed a lot

OpenBMB / VisCPM

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

1.06k stars 93 forks source link

low-resource inference reduces the inference speed a lot #31

Closed Miracle2333 closed 8 months ago

Miracle2333 commented 10 months ago

when the low-resource inference is turned on for visual-chat, the inference speed decreased 5 times. Could you provide how to quantinize the model for speeding up the inference?

JamesHujy commented 8 months ago

We are still working on developing the quantinized model.