ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`...

大家好！

我今天试着跑了下这个 Huatuo-Llama-Med-Chinese （整个过程见下文说明），然后遇到了这个错误：

$ bash scripts/infer.sh 
/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.0.4) or chardet (4.0.0) doesn't match a supported version!
  warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 5.0
CUDA SETUP: Detected CUDA version 122
/home/tcmai/.local/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /home/tcmai/.local/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda122_nocublaslt.so...
/home/tcmai/.local/lib/python3.10/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
The tokenizer class you load from this checkpoint is not the same type as the class this function is called from. It may result in unexpected tokenization. 
The tokenizer class you load from this checkpoint is 'LLaMATokenizer'. 
The class this function is called from is 'LlamaTokenizer'.
The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function.
Loading checkpoint shards: 100%|█████████████████████████████████████| 33/33 [01:52<00:00,  3.40s/it]
using lora ./lora-llama-med
Traceback (most recent call last):
  File "/data/source/medical-llm/Huatuo-Llama-Med-Chinese-git/infer.py", line 125, in <module>
    fire.Fire(main)
  File "/home/tcmai/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/tcmai/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/tcmai/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/data/source/medical-llm/Huatuo-Llama-Med-Chinese-git/infer.py", line 47, in main
    model = PeftModel.from_pretrained(
  File "/home/tcmai/.local/lib/python3.10/site-packages/peft/peft_model.py", line 181, in from_pretrained
    model.load_adapter(model_id, adapter_name, **kwargs)
  File "/home/tcmai/.local/lib/python3.10/site-packages/peft/peft_model.py", line 406, in load_adapter
    dispatch_model(
  File "/home/tcmai/.local/lib/python3.10/site-packages/accelerate/big_modeling.py", line 345, in dispatch_model
    raise ValueError(
ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.model.layers.4, base_model.model.model.layers.5, base_model.model.model.layers.6, base_model.model.model.layers.7, base_model.model.model.layers.8, base_model.model.model.layers.9, base_model.model.model.layers.10, base_model.model.model.layers.11, base_model.model.model.layers.12, base_model.model.model.layers.13, base_model.model.model.layers.14, base_model.model.model.layers.15, base_model.model.model.layers.16, base_model.model.model.layers.17, base_model.model.model.layers.18, base_model.model.model.layers.19, base_model.model.model.layers.20, base_model.model.model.layers.21, base_model.model.model.layers.22, base_model.model.model.layers.23, base_model.model.model.layers.24, base_model.model.model.layers.25, base_model.model.model.layers.26, base_model.model.model.layers.27, base_model.model.model.layers.28, base_model.model.model.layers.29, base_model.model.model.layers.30, base_model.model.model.layers.31, base_model.model.model.norm, base_model.model.lm_head.
$

整个过程大致如下：

（1）clone 这个 Huatuo-Llama-Med-Chinese repo ，下载 readme 中提到的四个模型权重数据，pip 安装好依赖；（2）运行 $ bash scripts/infer.sh，根据错误提示，手动编译安装 bitsandbytes 支持我显卡的 cuda122 版；（3）再运行 $ bash scripts/infer.sh，根据错误提示，clone 下载 https://huggingface.co/decapoda-research/llama-7b-hf 中的基础模型数据；（4）再运行 $ bash scripts/infer.sh，便出现了上面的 "ValueError: We need an offload_dir to dispatch this model according to this device_map" 错误。

目前就中断在第（4）步里了，不知道这个错误是什么原因。是最近的 llama-7b-hf 数据不兼容，还是？

谢谢！

SCIR-HI / Huatuo-Llama-Med-Chinese

ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`... #79