PKU-YuanGroup / ChatLaw

ChatLaw：A Powerful LLM Tailored for Chinese Legal. 中文法律大模型

https://chatlaw.cloud/

GNU Affero General Public License v3.0

6.92k stars 544 forks source link

13b的模型跑起来，需要多少显存资源 #45

Open Jonsun-N opened 1 year ago

Jonsun-N commented 1 year ago

13b的模型跑起来，需要多少显存资源

liupengfei0324 commented 1 year ago

30GB左右显存的显卡支持

Jonsun-N commented 1 year ago

确认一下，是多张卡加起来就行是吧，不是一张卡的显存必须大于30g吧？

ffabbwl commented 1 year ago

确认一下，是多张卡加起来就行是吧，不是一张卡的显存必须大于30g吧？

应该是单张显卡必须要30G，显存貌似不能叠加，可以考虑量化为int8

ImmNaruto commented 1 year ago

可以切分到多张卡部署吗，本地测试了下单张24G的3090部署不了，想尝试下多卡

Mewral commented 1 year ago

可以切分到多张卡部署吗，本地测试了下单张24G的3090部署不了，想尝试下多卡

参考deepspeed Zero stage 3

jinfengfeng commented 1 year ago

可以切分到多张卡部署吗，本地测试了下单张24G的3090部署不了，想尝试下多卡

可以尝试llama.cpp，速度更快，支持多卡。

nuaabuaa07 commented 1 year ago

我A10 双卡，也报不支持多卡错误。可以详细说一下，如何多卡使用吗？

nuaabuaa07 commented 1 year ago

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained(
ziya_model_path,

torch_dtype=torch.float16,

     load_in_8bit=True,                                                      
     device_map="auto",                                                      
 )

nuaabuaa07 commented 1 year ago

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained( ziya_model_path,

直接加 load_in_8bit=True 会报错需要使用。需要这样 `python nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") model = LlamaForCausalLM.from_pretrained(
ziya_model_path,
quantization_config=nf4_config,
device_map='auto'
)

nuaabuaa07 commented 1 year ago

设置使用单显卡 export CUDA_VISIBLE_DEVICES=0 & python main.py

chiugui commented 8 months ago

可以使用cpu来运行这个13b模型吗？

chiugui commented 8 months ago

量化8bit 加载模型，是这样配置吗 ` model = LlamaForCausalLM.from_pretrained( ziya_model_path,

直接加 load_in_8bit=True 会报错需要使用。需要这样 `python nf4_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4") model = LlamaForCausalLM.from_pretrained( ziya_model_path, quantization_config=nf4_config, device_map='auto' )

`

请问，这个是加到那个配置文件中的呢？