使用finetune_chat.py 微调13B 模型，显存爆了，微调13B需要多少显存

Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案，结构参考alpaca

https://github.com/Facico/Chinese-Vicuna

Apache License 2.0

4.14k stars 422 forks source link

使用finetune_chat.py 微调13B 模型，显存爆了，微调13B需要多少显存 #202

Open tanglaoya321 opened 1 year ago

tanglaoya321 commented 1 year ago

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.38 GiB (GPU 0; 31.75 GiB total capacity; 24.78 GiB already allocated; 1.49 GiB free; 29.21 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

1、使用的是finetune_chat.py 脚本，模型改成了vicuna 13B 模型 2、python 的版本是2.0.0 ,显卡是v100，有4张卡,单张是32G，我配置了3张其他参数基本都是保持默认的，我看代码里已经是用8bit 加载的，看文档是3090 都能微调13B（24G）,是因为是我微调的是chat吗如果默认是要超过这么多显存，我可以怎么改，让显存不需要那么多

stevenkwong commented 1 year ago

可能跟训练脚本的CUTOFF_LEN有关，可以调小试试

tanglaoya321 commented 1 year ago

改CUTOFF_LEN 和batch_size 后是可以，看着说明说24G 都可以微调，不知道大家是不是直接用24G ，然后没改默认配置就跑起来了

Facico commented 1 year ago

13B用3090跑的，CUTOFF_LEN=2048，可以把batch size开小点试试。不过13B的那个跑的是finetune的代码，可以把这个代码里TARGET_MODULES 参数和finetune的代码对齐一下，消耗会小一点

machengyan commented 1 year ago

为什么我用3090跑不动finetune 13B模型啊? 之前可以. 配置如下, 没有改动. 加载后占用显存14G, 剩下10G. 然后跑起来瞬间就溢出了. MICRO_BATCH_SIZE = 4 # this could actually be 5 but i like powers of 2 BATCH_SIZE =128 MAX_STEPS = None GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE EPOCHS = 3 # we don't always need 3 tbh LEARNING_RATE = 3e-4 # the Karpathy constant CUTOFF_LEN = 256 # 256 accounts for about 96% of the data LORA_R = 8 LORA_ALPHA = 16 LORA_DROPOUT = 0.05 VAL_SET_SIZE = args.test_size #2000 TARGET_MODULES = [ "q_proj", "v_proj", ]

machengyan commented 1 year ago

重装了cuda, 好了.

focusaibuilder commented 1 year ago

重装了cuda, 好了.

怎么更改的cuda需要什么配置，训练大概要多久呀？