加载微调后模型时报错object of type 'QuantState' has no len()

KinokoY commented 7 months ago

使用官方的微调脚本和数据集，用QLora方法进行的微调。

!python cli_demo.py --from_pretrained 'checkpoints/finetune-visualglm-6b-11-28-13-22' 运行上面代码时报错：

[2023-11-28 15:04:33,198] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect) 2023-11-28 15:04:36.555529: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered 2023-11-28 15:04:36.555582: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered 2023-11-28 15:04:36.555609: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered 2023-11-28 15:04:37.678579: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT [2023-11-28 15:04:38,981] [INFO] building FineTuneVisualGLMModel model ... [2023-11-28 15:04:38,982] [INFO] [RANK 0] > initializing model parallel with size 1 [2023-11-28 15:04:38,983] [INFO] [RANK 0] You are using model-only mode. For torch.distributed users or loading model parallel models, set environment variables RANK, WORLD_SIZE and LOCAL_RANK. /usr/local/lib/python3.10/dist-packages/torch/nn/init.py:405: UserWarning: Initializing zero-element tensors is a no-op warnings.warn("Initializing zero-element tensors is a no-op") [2023-11-28 15:04:52,802] [INFO] [RANK 0] replacing layer 0 attention with lora [2023-11-28 15:04:53,393] [INFO] [RANK 0] replacing layer 14 attention with lora [2023-11-28 15:04:53,983] [INFO] [RANK 0] replacing chatglm linear layer with 4bit [2023-11-28 15:05:42,515] [INFO] [RANK 0] > number of parameters on model parallel rank 0: 7802848768 [2023-11-28 15:05:50,837] [INFO] [RANK 0] global rank 0 is loading checkpoint checkpoints/finetune-visualglm-6b-11-28-13-22/300/mp_rank_00_model_states.pt Traceback (most recent call last): File "/content/VisualGLM-6B/cli_demo.py", line 103, in main() File "/content/VisualGLM-6B/cli_demo.py", line 30, in main model, model_args = AutoModel.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/sat/model/base_model.py", line 337, in from_pretrained return cls.from_pretrained_base(name, args=args, home_path=home_path, url=url, prefix=prefix, build_only=build_only, overwrite_args=overwrite_args, **kwargs) File "/usr/local/lib/python3.10/dist-packages/sat/model/base_model.py", line 331, in from_pretrained_base load_checkpoint(model, args, load_path=model_path, prefix=prefix) File "/usr/local/lib/python3.10/dist-packages/sat/training/model_io.py", line 241, in load_checkpoint missing_keys, unexpected_keys = module.load_state_dict(sd['module'], strict=False) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1657, in load_state_dict load(self, state_dict) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1645, in load load(child, child_state_dict, child_prefix) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1645, in load load(child, child_state_dict, child_prefix) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1645, in load load(child, child_state_dict, child_prefix) [Previous line repeated 3 more times] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1639, in load module._load_from_state_dict( File "/usr/local/lib/python3.10/dist-packages/sat/model/finetune/lora2.py", line 49, in _load_from_state_dict copy_nested_list(state_dict[prefix+'quant_state'], self.weight.quant_state) File "/usr/local/lib/python3.10/dist-packages/sat/model/finetune/lora2.py", line 37, in copy_nested_list for i in range(len(dst)): TypeError: object of type 'QuantState' has no len()

尝试过pip install bitsandbytes==0.39.0，没有用求大佬帮忙看看！！

wwlaoxi commented 7 months ago

'QuantState' has no len()

请问这个问题解决了吗？我用QLora方法跑完官方微调项目，加载模型的时候报了一样的错

hahaha111111 commented 7 months ago

我也遇到了相同的问题，请问解决了吗

Caro-zll commented 7 months ago

我也遇到相同的问题，怎么解决？

drenched9 commented 7 months ago

我也是，有没有好心人回答一下

1049451037 commented 7 months ago

把这行代码直接改成device='cuda'：

https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

drenched9 commented 7 months ago

把这行代码直接改成device='cuda'：

https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

试过了，还是一样的报错

KinokoY commented 7 months ago

把这行代码直接改成device='cuda'：

https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

改了之后还是同样的报错，然后这两天试的时候还有个新的问题，用QLora微调的话会报： Build 4bit layer failed. You need to install the latest bitsandbytes. Try pip install bitsandbytes. （使用的还是bitsandbytes==0.39.0）按照报错信息更新了bitsandbytes之后QLora可以跑通，但是加载微调后的模型还是会报TypeError: object of type 'QuantState' has no len()

Guojunwei888 commented 6 months ago

把这行代码直接改成device='cuda'： https://github.com/THUDM/VisualGLM-6B/blob/main/cli_demo.py#L36

改了之后还是同样的报错，然后这两天试的时候还有个新的问题，用QLora微调的话会报： Build 4bit layer failed.您需要安装最新的 bitsandbytes。尝试。（使用的还是bitsandbytes==0.39.0）按照报错信息更新了bitsandbytes之后QLora可以跑通，但是加载微调后的模型还是会报TypeError： object of type 'QuantState' has no len（）pip install bitsandbytes

你可以在 checkpoints/finetune-visualglm-6b-01-12-09-56 目录下查看一下微调之后的权重大小是多少，看是否是 7GB，还是 15GB。删除 7GB 的微调权重，执行 bash finetune/finetune_visualglm.sh --quant 你会得到一个 15 GB 的新权重，这个权重是可执行的。具体什么原因我不清楚，希望有高手解答吧，大概和量化有关系。

THUDM / VisualGLM-6B

加载微调后模型时报错object of type 'QuantState' has no len() #324