Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 422 forks source link

使用finetune_chat.py 微调13B 模型,显存爆了,微调13B需要多少显存 #202

Open tanglaoya321 opened 1 year ago

tanglaoya321 commented 1 year ago

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.38 GiB (GPU 0; 31.75 GiB total capacity; 24.78 GiB already allocated; 1.49 GiB free; 29.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

1、使用的是finetune_chat.py 脚本,模型改成了vicuna 13B 模型 2、python 的版本是2.0.0 ,显卡是v100,有4张卡,单张是32G,我配置了3张 其他参数基本都是保持默认的,我看代码里已经是用8bit 加载的,看文档是3090 都能微调13B(24G),是因为是我微调的是chat吗 如果默认是要超过这么多显存,我可以怎么改,让显存不需要那么多

stevenkwong commented 1 year ago

可能跟训练脚本的CUTOFF_LEN有关,可以调小试试

tanglaoya321 commented 1 year ago

改CUTOFF_LEN 和batch_size 后是可以,看着说明说24G 都可以微调,不知道大家是不是直接用24G ,然后没改默认配置就跑起来了

Facico commented 1 year ago

13B用3090跑的,CUTOFF_LEN=2048,可以把batch size开小点试试。 不过13B的那个跑的是finetune的代码,可以把这个代码里TARGET_MODULES 参数和finetune的代码对齐一下,消耗会小一点

machengyan commented 1 year ago

为什么我用3090跑不动finetune 13B模型啊? 之前可以. 配置如下, 没有改动. 加载后占用显存14G, 剩下10G. 然后跑起来瞬间就溢出了. MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2 BATCH_SIZE =128 MAX_STEPS = None GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 3 # we don't always need 3 tbh LEARNING_RATE = 3e-4 # the Karpathy constant CUTOFF_LEN = 256 # 256 accounts for about 96% of the data LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = args.test_size #2000 TARGET_MODULES = [ "q_proj", "v_proj", ]

machengyan commented 1 year ago

重装了cuda, 好了.

focusaibuilder commented 1 year ago

重装了cuda, 好了.

怎么更改的cuda需要什么配置,训练大概要多久呀?