指定GPU后仍加载到内存中，使用CPU推理

WisdomShell / codeshell

A series of code large language models developed by PKU-KCL

http://se.pku.edu.cn/kcl

Other

1.61k stars 119 forks source link

指定GPU后仍加载到内存中，使用CPU推理 #50

Open yutong12 opened 10 months ago

yutong12 commented 10 months ago

实验环境：Tesla T4 16G 问题描述：我们使用的是CodeShell-7B-chat-int4这个版本，运行官方示例时构建过久，不包括下载时间，运行在GPU上加载并输出第一个示例结果用时为5分钟41秒。如何加速推理时间？在运行自带的demo cli_demo.py和web_demo.py时，仅更换模型路径，运行后发现模型未默认加载到GPU中而是加载到CPU中，--device默认是“cuda：0” 预期结果：能加快推理速度，正常输出

yutong12 commented 10 months ago

后续更新：在漫长的加载过后，仍然消耗掉了30G内存，6G显存，是否存在某种平衡？

shuaizai88 commented 7 months ago

可能要调整参数把，我反正看着我的内存崩了。。