Duxiaoman-DI / XuanYuan

轩辕:度小满中文金融对话大模型
1.07k stars 97 forks source link

xuanyuan6B-chat 在3090推理很慢 #23

Closed mosthandsomeman closed 7 months ago

mosthandsomeman commented 7 months ago

我在服务器部署Duxiaoman-DI/XuanYuan-6B-Chat,24G显存 占了22G 加载模型是的输出如下: use transformers.generate to infer... loading weight with transformers ... WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.

推理速度很慢,请问是什么原因 已解决,加载模型改成fp16就可以了。