HarderThenHarder / transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.
https://www.zhihu.com/column/c_1451236880973426688
2.11k stars 376 forks source link

RuntimeError: CUDA out of memory. Tried to allocate 42.00 MiB (GPU 0; 22.02 GiB total capacity; 20. #60

Open chenchaoac opened 1 year ago

chenchaoac commented 1 year ago

is:issue is:open 请问LLM LoRA Finetune单卡需要多大的显存呀?23G的A100 batch_size=1 max_source_seq_len =4 max_target_seq_len=2还是报内存不足错误呀

wuguangshuo commented 1 year ago

大佬解决了吗

kbwzy commented 1 year ago

同样24G显卡,batch_size=1 max_source_seq_len =50 失败,解决了吗?

a6225301 commented 1 year ago

@kbwzy @chenchaoac @wuguangshuo 各位解决了吗?

hsauod commented 1 year ago

24G显存,同样问题,请问解决了吗

rainkin1993 commented 1 year ago

我的显卡16GB,训练的时候也报错了,OOM。

理论上16G足够微调lora了,看了下代码,发现是因为微调训练结束后,在保存模型的时候,原始代码里面会将 原始模型和微调后心脏的模型参数 merge到一个模型,输出为一个模型文件(这个merge代码中,对训练后的模型deep_copy了一份,相当于需要的内存 * 2)。

解决方案就是:修改下train.py文件中的save_model函数,不merge参数,只将微调后的模型参数单独保存。当然,由于没有merge到一个模型,在推理的时候也需要相应修改下代码,使得代码能够加载原始模型+lora模型参数。

lora模型参数单独保留:

diff --git a/LLM/finetune/train.py b/LLM/finetune/train.py
index 4483fc0..53dc4e9 100644
--- a/LLM/finetune/train.py
+++ b/LLM/finetune/train.py
@@ -155,12 +155,13 @@ def save_model(
     Args:
         cur_save_path (str): 存储路径。
     """
-    if args.use_lora:                       # merge lora params with origin model
-        merged_model = copy.deepcopy(model)
-        merged_model = merged_model.merge_and_unload()
-        merged_model.save_pretrained(cur_save_dir)
-    else:
-        model.save_pretrained(cur_save_dir)
+    # if args.use_lora:                       # merge lora params with origin model
+    #     merged_model = copy.deepcopy(model)
+    #     merged_model = merged_model.merge_and_unload()
+    #     merged_model.save_pretrained(cur_save_dir)
+    # else:
+    #     model.save_pretrained(cur_save_dir)
+    model.save_pretrained(cur_save_dir)

推理的时候,单独加载下lora参数模型:

diff --git a/LLM/finetune/inference.py b/LLM/finetune/inference.py
index f7d1311..183241a 100644
--- a/LLM/finetune/inference.py
+++ b/LLM/finetune/inference.py
@@ -1,3 +1,4 @@
+# coding: utf8
 # !/usr/bin/env python3
 """
 ==== No Bugs in code, just some Random Unexpected FEATURES ====
@@ -23,6 +24,7 @@ Date: 2023/03/17
 import time
 import torch

+from peft import PeftModel
 from transformers import AutoTokenizer, AutoModel
 torch.set_default_tensor_type(torch.cuda.HalfTensor)

@@ -64,18 +66,21 @@ if __name__ == '__main__':

     device = 'cuda:0'
     max_new_tokens = 300
-    model_path = "checkpoints/model_1000"
+    lora_model_path = "checkpoints/finetune/model_1000"

     tokenizer = AutoTokenizer.from_pretrained(
-        model_path,
+        "D:\\software\\chatglm-6b\\chatglm-6b", # 改成chatglm-6b原始模型的地址
         trust_remote_code=True
     )

     model = AutoModel.from_pretrained(
-        model_path,
+        "D:\\software\\chatglm-6b\\chatglm-6b", # # 改成chatglm-6b原始模型的地址
         trust_remote_code=True
     ).half().to(device)

+    model = PeftModel.from_pretrained(model, lora_model_path, adapter_name="lora")