finetune时报错：KeyError: 'models.llama'

simonqian commented 1 year ago

操作系统：Ubuntu
显卡：3090
Python：3.8

其他Python库版本如下：

pytorch-mutex             1.0                        cuda    pytorch
torch                     2.0.0                    pypi_0    pypi
torchaudio                0.12.1               py38_cu113    pytorch
torchvision               0.13.1               py38_cu113    pytorch
cudatoolkit               11.3.1               h2bc3f7f_2    defaults
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
transformers              4.27.4                   pypi_0    pypi
tokenizers                0.13.3                   pypi_0    pypi
sentencepiece             0.1.97                   pypi_0    pypi

nvidia-smi

NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4

nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

上游模型、lora模型、数据都下载到本地了

finetune脚本如下：


DATA_PATH="./sample/merge.json" #"../dataset/instruction/guanaco_non_chat_mini_52K-utf8.json" #"./sample/merge_sample.json"
OUTPUT_PATH="my-lora-Vicuna"
MODEL_PATH="../llama-13b-hf/"
lora_checkpoint="../Chinese-Vicuna-lora-13b-belle-and-guanaco/"
TEST_SIZE=2000

python finetune.py \ --data_path $DATA_PATH \ --output_path $OUTPUT_PATH \ --model_path $MODEL_PATH \ --eval_steps 200 \ --save_steps 200 \ --test_size $TEST_SIZE


### 执行finetune脚本时报错如下：
```bash
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /home/doudou/miniconda3/envs/chinese-vicuna/lib/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.6
CUDA SETUP: Detected CUDA version 113
CUDA SETUP: Loading binary /home/doudou/miniconda3/envs/chinese-vicuna/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda113.so...
Traceback (most recent call last):
  File "finetune.py", line 13, in <module>
    "LlamaTokenizer" in transformers._import_structure["models.llama"]
KeyError: 'models.llama'

求助一下，这是什么原因呢？

simonqian commented 1 year ago

我卸载了pip安装的transformers，用conda install -c huggingface transformers安装，安装后版本如下：

transformers              4.27.4                     py_0    huggingface

执行finetune还是一样的报错

Facico commented 1 year ago

transformers不能直接这样下载，要像requirement.txt里面一样从github直接拉去，llama现在还没在transformers发布最新版本里，还在github的仓库中。拉去后版本为4.28.0.dev

simonqian commented 1 year ago

@Facico 谢谢。我现在使用源码安装

git clone https://github.com/huggingface/transformers.git
cd transformers
pip install -e .

版本如下：

transformers              4.28.0.dev0              pypi_0    pypi

安装好之后我回到Chinese-Vicuna目录下执行finetune，发现其他的错误：

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: Required library version not found: libsbitsandbytes_cpu.so. Maybe you need to compile it from source?
CUDA SETUP: Defaulting to libbitsandbytes_cpu.so...
/home/doudou/miniconda3/envs/chinese-vicuna/lib/python3.8/site-packages/bitsandbytes/cextension.py:31: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
Traceback (most recent call last):
  File "finetune.py", line 8, in <module>
    import transformers
  File "/home/doudou/projects/github/transformers-main/src/transformers/__init__.py", line 26, in <module>
    from . import dependency_versions_check
  File "/home/doudou/projects/github/transformers-main/src/transformers/dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "/home/doudou/projects/github/transformers-main/src/transformers/utils/__init__.py", line 57, in <module>
    from .hub import (
  File "/home/doudou/projects/github/transformers-main/src/transformers/utils/hub.py", line 32, in <module>
    from huggingface_hub import (
  File "/home/doudou/miniconda3/envs/chinese-vicuna/lib/python3.8/site-packages/huggingface_hub/__init__.py", line 278, in __getattr__
    submod = importlib.import_module(submod_path)
  File "/home/doudou/miniconda3/envs/chinese-vicuna/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/doudou/miniconda3/envs/chinese-vicuna/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 21, in <module>
    from filelock import FileLock
ModuleNotFoundError: No module named 'filelock'

是不是环境还有问题？

simonqian commented 1 year ago

国内使用 pip install git+https://github.com/huggingface/transformers 安装会报错：

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-lxrdj0gq
  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-lxrdj0gq
  fatal: unable to access 'https://github.com/huggingface/transformers/': gnutls_handshake() failed: The TLS connection was non-properly terminated.
  error: subprocess-exited-with-error

  × git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-lxrdj0gq did not run successfully.
  │ exit code: 128
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/huggingface/transformers /tmp/pip-req-build-lxrdj0gq did not run successfully.
│ exit code: 128
╰─> See above for output.

估计是网络的原因。所以我是下载了源码zip包手动安装，安装是成功的。

Facico commented 1 year ago

你这个应该还是依赖的问题，你安装的时候有参照我们的指引运行过"pip install -r requirements.txt"这个命令吗。

“The TLS connection was non-properly terminated”一看就是网络问题，这个没办法，我们平时都是终端挂代理。

simonqian commented 1 year ago

有执行 pip install -r requirements.txt 的，但是因为网络问题，一直安装不成功，所以我都把https去掉了。其他依赖是安装成功的，只有transformers和peft是手动安装。

Facico commented 1 year ago

可以参考一下这里提供的一个依赖，或者你遇到ModuleNotFoundError: No module named啥的可以把它尝试着下一遍

simonqian commented 1 year ago

好的好的，我试一下，谢谢！

simonqian commented 1 year ago

经过反复多次安装 pip install git+https://github.com/huggingface/transformers之后，终于安装成功了！现在可以执行finetune了，谢谢大佬！ @Facico

不过我还有个问题，我已经下载 https://huggingface.co/datasets/Chinese-Vicuna/guanaco_belle_merge_v1.0 的merge.json了，为啥现在还在下载数据啊？我的脚本写错了吗？

我的脚本：

DATA_PATH="./sample/merge.json" # 这是下载好的merge.json
OUTPUT_PATH="my-lora-Vicuna"
MODEL_PATH="../llama-13b-hf/" # 下载好的llama-13b模型
lora_checkpoint="../Chinese-Vicuna-lora-13b-belle-and-guanaco/" # 下载到本地的
TEST_SIZE=2000

python finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size $TEST_SIZE

simonqian commented 1 year ago

@Facico 我想问下，finetune需要多少G显存呢？我现在出现异常了

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB (GPU 0; 23.70 GiB total capacity; 13.51 GiB already allocated; 95.56 MiB free; 13.94 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Facico commented 1 year ago

如果是7b的话，代码中的默认设置大概是9G左右。13b的话，应该需要将近20个G，你可以把MICRO_BATCH_SIZE设置小一点

simonqian commented 1 year ago

好的，谢谢！

molyswu commented 1 year ago

python finetune.py --data_path merge.json --test_size 2000 如下错误： CUDA SETUP: CUDA runtime path found: /root/anaconda3/lib/libcudart.so CUDA SETUP: Highest compute capability among GPUs detected: 8.6 CUDA SETUP: Detected CUDA version 113 CUDA SETUP: Loading binary /root/anaconda3/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda113.so... Chinese-Vicuna-lora-13b-belle-and-guanaco Overriding torch_dtype=None with torch_dtype=torch.float16 due to requirements of bitsandbytes to enable model loading in mixed int8. Either pass torch_dtype=torch.float16 or don't pass this argument at all to remove this warning. Traceback (most recent call last): File "/home/Chinese-Vicuna/finetune.py", line 64, in model = LlamaForCausalLM.from_pretrained( File "/root/anaconda3/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2405, in from_pretrained raise EnvironmentError( OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Chinese-Vicuna-lora-13b-belle-and-guanaco.

24 parser = argparse.ArgumentParser() 25 parser.add_argument("--wandb", action="store_true", default=False) 26 parser.add_argument("--data_path", type=str, default="./sample/merge.json") 27 parser.add_argument("--output_path", type=str, default="out") 28 parser.add_argument("--model_path", type=str, default="Chinese-Vicuna-lora-13b-belle-and-guanaco") 29 parser.add_argument("--eval_steps", type=int, default=200) 30 parser.add_argument("--save_steps", type=int, default=200) 31 parser.add_argument("--test_size", type=int, default=200) 32 parser.add_argument("--resume_from_checkpoint", type=str, default=None) 33 parser.add_argument("--ignore_data_skip", type=str, default="False")

molyswu commented 1 year ago

模型从huggingface下载：https://huggingface.co/Chinese-Vicuna/Chinese-Vicuna-lora-13b-belle-and-guanaco/tree/main

molyswu commented 1 year ago

从huggingface下载llama-7b模型可以了

96005900 commented 10 months ago

经过反复多次安装 pip install git+https://github.com/huggingface/transformers之后，终于安装成功了！现在可以执行finetune了，谢谢大佬！ @Facico

不过我还有个问题，我已经下载 https://huggingface.co/datasets/Chinese-Vicuna/guanaco_belle_merge_v1.0 的merge.json了，为啥现在还在下载数据啊？我的脚本写错了吗？

我的脚本：
DATA_PATH="./sample/merge.json" # 这是下载好的merge.json
OUTPUT_PATH="my-lora-Vicuna"
MODEL_PATH="../llama-13b-hf/" # 下载好的llama-13b模型
lora_checkpoint="../Chinese-Vicuna-lora-13b-belle-and-guanaco/" # 下载到本地的
TEST_SIZE=2000

python finetune.py \
--data_path $DATA_PATH \
--output_path $OUTPUT_PATH \
--model_path $MODEL_PATH \
--eval_steps 200 \
--save_steps 200 \
--test_size $TEST_SIZE

请问你最终是怎么成功的，我也遇到了一样的问题

96005900 commented 10 months ago

pip install git+https://github.com/huggingface/transformers 一直网络报错，我这边也不方便挂代理

Facico / Chinese-Vicuna

finetune时报错：KeyError: 'models.llama' #50