Closed mosthandsomeman closed 7 months ago
我在服务器部署Duxiaoman-DI/XuanYuan-6B-Chat,24G显存 占了22G 加载模型是的输出如下: use transformers.generate to infer... loading weight with transformers ... WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
推理速度很慢,请问是什么原因 已解决,加载模型改成fp16就可以了。
我在服务器部署Duxiaoman-DI/XuanYuan-6B-Chat,24G显存 占了22G 加载模型是的输出如下: use transformers.generate to infer... loading weight with transformers ... WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
推理速度很慢,请问是什么原因 已解决,加载模型改成fp16就可以了。