部署 CodeShell-7B-Chat 的硬件需求？

toohandsome commented 12 months ago

我想在公司内部搭建一套 CodeShell-7B-Chat ，大概用户数量200~300 ，请问需要多大的内存和显卡？

ironmanlj commented 12 months ago

反正我在GPU上布了一个codeshell-7B-Chat，用的是V100，显存用了18-19个g, cpu没怎么用

qianma819 commented 12 months ago

反正我在GPU上布了一个codeshell-7B-Chat，用的是V100，显存用了18-19个g, cpu没怎么用采用tgi进行部署，用的4070，买不起好的显卡。按照文档的参数 docker run --gpus 'all' --shm-size 1g -p 9090:80 -v $HOME/models:/data \ --env LOG_LEVEL="info,text_generation_router=debug" \ ghcr.nju.edu.cn/huggingface/text-generation-inference:1.0.3 \ --model-id /data/CodeShell-7B-Chat --num-shard 1 \ --max-total-tokens 5000 --max-input-length 4096 \ --max-stop-sequences 12 --trust-remote-code 运行错误ERROR shard-manager:text_generation_launcher:Shard complete standard error output: 1.是4070的12显存不够么？ 2.这个要怎么配置host？要额外加参数么？CodeShell-7B-Chat-int4模型命令./server -m ./models/codeshell-chat-q4_0.gguf --host 127.0.0.1 --port 8080 一看就知道怎么配置

ironmanlj commented 12 months ago

1.12g显存跑6b模型不够，我试了一下显存至少16-18g吧。 2.因为它用的是docker部署，-p的那个参数就是映射端口，把容器的80端口映射到服务器的9090端口，至于内部为啥设置成80端口，应该是默认的。

qianma819 commented 12 months ago

1.12g显存跑6b模型不够，我试了一下显存至少16-18g吧。

看提示确实是gpu的显存不够。可以更改max_split_size_mb，但是我没搜到这个。cuda内存溢出，可以改小batchsize，这个batchsize是在哪改知道不？ 2.docker部署的话，我用vs插件访问，那么需要配置服务器的ip。docker参数是否可以配置这个？

ironmanlj commented 12 months ago

1.12g显存跑6b模型不够，我试了一下显存至少16-18g吧。

看提示确实是gpu的显存不够。可以更改max_split_size_mb，但是我没搜到这个。cuda内存溢出，可以改小batchsize，这个batchsize是在哪改知道不？ 2.docker部署的话，我用vs插件访问，那么需要配置服务器的ip。docker参数是否可以配置这个？

batchsize我不清楚，用vs插件访问，也可以用docker部署，改配置的时候就把ip改成你部模型的ip,端口就是你的映射端口，比如上面那个就是9090，就能访问了

wxfvf commented 11 months ago

用docker部署，24G直接崩了，6B模型怎么会用这么大的内存？

jump2 commented 11 months ago

我docker部署显存8G，内存16G跑不起来 2023-12-01T06:01:48.153557Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2023-12-01T06:01:58.169072Z INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0 2023-12-01T06:02:09.732462Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

The argument trust_remote_code is to be used with Auto classes. It has no effect here and is ignored. Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s] rank=0 2023-12-01T06:02:09.732513Z ERROR shard-manager: text_generation_launcher: Shard process was signaled to shutdown with signal 9 rank=0 2023-12-01T06:02:09.832288Z ERROR text_generation_launcher: Shard 0 failed to start 2023-12-01T06:02:09.832412Z INFO text_generation_launcher: Shutting down shards 一直是这样，各位运行起来的都是多大的显存和内存的

MeJerry215 commented 11 months ago

用docker部署，24G直接崩了，6B模型怎么会用这么大的内存？

@wxfvf 你在加载模型的地方看看是加载float32的模型还是float16的模型 6B模型加载 fp16 x2 = 12G至少能加载 fp32 模型x4 = 24G至少能加载所以直接崩了内存，这个模型好像默认用fp32 我服了。

load的地方 torch_dtype=torch.float16 我改完之后内存降了，要么就是你推的时候太长的tokens？占用了过多的kv cache。

wxfvf commented 11 months ago

用docker部署，24G直接崩了，6B模型怎么会用这么大的内存？

@wxfvf 你在加载模型的地方看看是加载float32的模型还是float16的模型 6B模型加载 fp16 x2 = 12G至少能加载 fp32 模型x4 = 24G至少能加载所以直接崩了内存，这个模型好像默认用fp32 我服了。

load的地方 torch_dtype=torch.float16 我改完之后内存降了，要么就是你推的时候太长的tokens？占用了过多的kv cache。

一开始用vs插件上的官方参数直接跑不起来，添加了 --dtype bfloat16 后还是崩，又改了token长度 --max-total-tokens 4098 --max-input-length 2048 ，终于跑起来了，显存占了18、19G左右。

zpjmj commented 8 months ago

@wxfvf @MeJerry215 老哥，多卡怎么运行啊。单开按照你的参数成功了。用官方的参数--gpus 'all' 2张GPU直接就爆显存啦。

WisdomShell / codeshell

部署 CodeShell-7B-Chat 的硬件需求？ #56

WisdomShell / codeshell

部署 CodeShell-7B-Chat 的 硬件需求？ #56

部署 CodeShell-7B-Chat 的硬件需求？ #56