xuanyuan6B-chat 在3090推理很慢

我在服务器部署Duxiaoman-DI/XuanYuan-6B-Chat，24G显存占了22G 加载模型是的输出如下： use transformers.generate to infer... loading weight with transformers ... WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.

推理速度很慢，请问是什么原因已解决，加载模型改成fp16就可以了。

Duxiaoman-DI / XuanYuan

xuanyuan6B-chat 在3090推理很慢 #23