Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 425 forks source link

运行generate.py推理报错ValueError: We need an `offload_dir` to dispatch this model #225

Open kakuibeyond opened 1 year ago

kakuibeyond commented 1 year ago

报错描述

使用readme中提供的colab链接运行,finetune过程顺利,顺利生成了文件夹如下 image

下一步运行generate.py推理时,使用的命令如下(下面两行代码先后使用了官方提供的脚本、上一步得到的lora模型路径),均得到同样报错:

!python ./Chinese-Vicuna/generate.py --model_path decapoda-research/llama-7b-hf --lora_path Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco --use_local 0
# !python ./Chinese-Vicuna/generate.py --model_path decapoda-research/llama-7b-hf --lora_path lora-Vicuna --use_local 0

报错详情如下:

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
Namespace(model_path='decapoda-research/llama-7b-hf', lora_path='Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco', use_typewriter=1, use_local=0)
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
normalizer.cc(51) LOG(INFO) precompiled_charsmap is empty. use identity normalization.
Facico/Chinese-Vicuna-lora-7b-3epoch-belle-and-guanaco/adapter_model.bin
Loading checkpoint shards: 100% 33/33 [01:08<00:00,  2.09s/it]
Downloading (…)/adapter_config.json: 100% 370/370 [00:00<00:00, 248kB/s]
Downloading adapter_model.bin: 100% 16.8M/16.8M [00:00<00:00, 103MB/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /content/./Chinese-Vicuna/generate.py:61 in <module>                         │
│                                                                              │
│    58 │   │   torch_dtype=torch.float16,                                     │
│    59 │   │   device_map="auto", #device_map={"": 0},                        │
│    60 │   )                                                                  │
│ ❱  61 │   model = StreamPeftGenerationMixin.from_pretrained(                 │
│    62 │   │   model, LORA_WEIGHTS, torch_dtype=torch.float16, device_map="au │
│    63 │   )                                                                  │
│    64 elif device == "mps":                                                  │
│                                                                              │
│ /content/Chinese-Vicuna/utils.py:770 in from_pretrained                      │
│                                                                              │
│   767 │   │   │   model._reorder_cache = model.base_model._reorder_cache     │
│   768 │   │   │   return model                                               │
│   769 │   │   else:                                                          │
│ ❱ 770 │   │   │   return cls.from_pretrained_old_peft_version(model, model_i │
│   771 │                                                                      │
│   772 │                                                                      │
│   773 │   @classmethod                                                       │
│                                                                              │
│ /content/Chinese-Vicuna/utils.py:821 in from_pretrained_old_peft_version     │
│                                                                              │
│   818 │   │   │   │   │   max_memory=max_memory,                             │
│   819 │   │   │   │   │   no_split_module_classes=no_split_module_classes,   │
│   820 │   │   │   │   )                                                      │
│ ❱ 821 │   │   │   model = dispatch_model(model, device_map=device_map)       │
│   822 │   │   │   hook = AlignDevicesHook(io_same_device=True)               │
│   823 │   │   │   if model.peft_config.peft_type == PeftType.LORA:           │
│   824 │   │   │   │   add_hook_to_module(model.base_model.model, hook)       │
│                                                                              │
│ /usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py:264 in    │
│ dispatch_model                                                               │
│                                                                              │
│   261 │                                                                      │
│   262 │   disk_modules = [name for name, device in device_map.items() if dev │
│   263 │   if offload_dir is None and offload_index is None and len(disk_modu │
│ ❱ 264 │   │   raise ValueError(                                              │
│   265 │   │   │   "We need an `offload_dir` to dispatch this model according │
│   266 │   │   │   f"need to be offloaded: {', '.join(disk_modules)}."        │
│   267 │   │   )                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: We need an `offload_dir` to dispatch this model according to this 
`device_map`, the following submodules need to be offloaded: 
base_model.model.model.layers.27, base_model.model.model.layers.28, 
base_model.model.model.layers.29, base_model.model.model.layers.30, 
base_model.model.model.layers.31, base_model.model.model.norm, 
base_model.model.lm_head.

运行环境

Google Colab,T4 15G image

kakuibeyond commented 1 year ago

根据这里的提示,尝试重新安装peft的版本,仍然报一样的错

1MLightyears commented 1 year ago

遇到同样问题

Facico commented 1 year ago

把这行“device_map="auto", #device_map={"": 0}”改成“device_map={"": 0}”(前面那个是多卡推理的,老版本的依赖不支持多卡推理) 或者按照requirement_4bit.txt的依赖升级一下相关的包可以解决(应该升级transformers和peft就能解决)

但是新版本的依赖训练8bit的时候保存会有点问题,所以当时写的时候还是保留了的