THUDM / VisualGLM-6B

Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
Apache License 2.0
4.05k stars 414 forks source link

加载微调后模型时报错object of type 'QuantState' has no len() #324

Open KinokoY opened 7 months ago

KinokoY commented 7 months ago

使用官方的微调脚本和数据集,用QLora方法进行的微调。

!python cli_demo.py --from_pretrained 'checkpoints/finetune-visualglm-6b-11-28-13-22' 运行上面代码时报错:

[2023-11-28 15:04:33,198] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) 2023-11-28 15:04:36.555529: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-28 15:04:36.555582: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-28 15:04:36.555609: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-11-28 15:04:37.678579: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [2023-11-28 15:04:38,981] [INFO] building FineTuneVisualGLMModel model ... [2023-11-28 15:04:38,982] [INFO] [RANK 0] > initializing model parallel with size 1 [2023-11-28 15:04:38,983] [INFO] [RANK 0] You are using model-only mode. For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK. /usr/local/lib/python3.10/dist-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") [2023-11-28 15:04:52,802] [INFO] [RANK 0] replacing layer 0 attention with lora [2023-11-28 15:04:53,393] [INFO] [RANK 0] replacing layer 14 attention with lora [2023-11-28 15:04:53,983] [INFO] [RANK 0] replacing chatglm linear layer with 4bit [2023-11-28 15:05:42,515] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768 [2023-11-28 15:05:50,837] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/finetune-visualglm-6b-11-28-13-22/300/mp_rank_00_model_states.pt Traceback (most recent call last): File "/content/VisualGLM-6B/cli_demo.py", line 103, in main() File "/content/VisualGLM-6B/cli_demo.py", line 30, in main model, model_args = AutoModel.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/sat/model/base_model.py", line 337, in from_pretrained return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/sat/model/base_model.py", line 331, in from_pretrained_base load_checkpoint(model, args, load_path=model_path, prefix=prefix) File "/usr/local/lib/python3.10/dist-packages/sat/training/model_io.py", line 241, in load_checkpoint missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1657, in load_state_dict load(self, state_dict) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1645, in load load(child, child_state_dict, child_prefix) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1645, in load load(child, child_state_dict, child_prefix) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1645, in load load(child, child_state_dict, child_prefix) [Previous line repeated 3 more times] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1639, in load module._load_from_state_dict( File "/usr/local/lib/python3.10/dist-packages/sat/model/finetune/lora2.py", line 49, in _load_from_state_dict copy_nested_list(state_dict[prefix+'quant_state'], self.weight.quant_state) File "/usr/local/lib/python3.10/dist-packages/sat/model/finetune/lora2.py", line 37, in copy_nested_list for i in range(len(dst)): TypeError: object of type 'QuantState' has no len()

尝试过pip install bitsandbytes==0.39.0,没有用 求大佬帮忙看看!!

wwlaoxi commented 7 months ago

'QuantState' has no len()

请问这个问题解决了吗?我用QLora方法跑完官方微调项目,加载模型的时候报了一样的错

hahaha111111 commented 7 months ago

我也遇到了相同的问题,请问解决了吗

Caro-zll commented 7 months ago

我也遇到相同的问题,怎么解决?

drenched9 commented 7 months ago

我也是,有没有好心人回答一下

1049451037 commented 7 months ago

把这行代码直接改成device='cuda':

https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

drenched9 commented 7 months ago

把这行代码直接改成device='cuda':

https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

试过了,还是一样的报错

KinokoY commented 7 months ago

把这行代码直接改成device='cuda':

https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

改了之后还是同样的报错,然后这两天试的时候还有个新的问题,用QLora微调的话会报: Build 4bit layer failed. You need to install the latest bitsandbytes. Try pip install bitsandbytes. (使用的还是bitsandbytes==0.39.0) 按照报错信息更新了bitsandbytes之后QLora可以跑通,但是加载微调后的模型还是会报TypeError: object of type 'QuantState' has no len()

Guojunwei888 commented 6 months ago

把这行代码直接改成device='cuda': https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

改了之后还是同样的报错,然后这两天试的时候还有个新的问题,用QLora微调的话会报: Build 4bit layer failed.您需要安装最新的 bitsandbytes。尝试。 (使用的还是bitsandbytes==0.39.0) 按照报错信息更新了bitsandbytes之后QLora可以跑通,但是加载微调后的模型还是会报TypeError: object of type 'QuantState' has no len()pip install bitsandbytes

你可以在 checkpoints/finetune-visualglm-6b-01-12-09-56 目录下查看一下微调之后的权重大小是多少,看是否是 7GB,还是 15GB。 删除 7GB 的微调权重,执行 bash finetune/finetune_visualglm.sh --quant 你会得到一个 15 GB 的新权重,这个权重是可执行的。 具体什么原因我不清楚,希望有高手解答吧,大概和量化有关系。