运行微调后的CodeShell-7B-Chat-int4模型失败：TypeError: cannot assign 'torch.BFloat16Tensor' as parameter 'weight' (torch.nn.Parameter or None expected)

我准备的微调数据如下：

[
  {
    "id": "identity_0",
    "conversations": [
      {
        "from": "human",
        "value": "你好"
      },
      {
        "from": "assistant",
        "value": "您好，我是CodeShell，请问有什么可以帮助您的吗？"
      }
    ]
  },
  {
    "id": "identity_1",
    "conversations": [
      {
        "from": "human",
        "value": "我想了解电动汽车和燃油汽车的区别"
      },
      {
        "from": "assistant",
        "value": "电动汽车主要依靠电池驱动，而燃油汽车依靠燃烧汽油来提供动力。电动汽车通常更环保，但续航距离和充电设施仍然是其发展的难点。"
      }
    ]
  },
  {
    "id": "identity_2",
    "conversations": [
      {
        "from": "human",
        "value": "天气怎么样？"
      },
      {
        "from": "assistant",
        "value": "我没有访问天气信息的能力，建议您查看最新的天气预报。"
      }
    ]
  },
  {
    "id": "identity_3",
    "conversations": [
      {
        "from": "human",
        "value": "你会讲其他语言吗？"
      },
      {
        "from": "assistant",
        "value": "我可以理解并回应多种语言，但我的主要语言是中文。"
      }
    ]
  },
  {
    "id": "identity_4",
    "conversations": [
      {
        "from": "human",
        "value": "学习编程需要多长时间？"
      },
      {
        "from": "assistant",
        "value": "学习编程的时间取决于多种因素，包括您投入的时间、学习材料的质量和您的学习效率。通常，达到熟练水平可能需要几个月到几年不等。"
      }
    ]
  },
  {
    "id": "identity_5",
    "conversations": [
      {
        "from": "human",
        "value": "你能为我推荐一些书吗？"
      },
      {
        "from": "assistant",
        "value": "当然可以。请问您对哪个领域的书籍感兴趣？"
      }
    ]
  }
]

我运行

./run_finetune.sh ../WisdomShell/CodeShell-7B-Chat-int4/ ../../data/train.json

后得到了微调后的模型我准备了一个test.py文件，该文件指定了模型为我刚才微调后的 CodeShell-7B-Chat-int4 模型（即output_models）：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained("output_models", local_files_only=True)
model = AutoModelForCausalLM.from_pretrained("output_models", trust_remote_code=True, local_files_only=True).to(device)

history = []
query = '你能为我推荐一些书吗？'
response = model.chat(query, history, tokenizer)
print(response)
history.append((query, response))

query = '我想了解电动汽车和燃油汽车的区别'
response = model.chat(query, history, tokenizer)
print(response)
history.append((query, response))

我运行

python3 test.py

后出现错误：

(.env) (base) [root@localhost finetune]# python3 test.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/opt/ai/WisdomShell/CodeShell-7B-Chat-int4/finetune/test.py", line 8, in <module>
    model = AutoModelForCausalLM.from_pretrained("output_models", trust_remote_code=True, local_files_only=True).to(device)
  File "/root/anaconda3/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    return model_class.from_pretrained(
  File "/root/.cache/huggingface/modules/transformers_modules/output_models/modeling_codeshell.py", line 1045, in from_pretrained
    model = load_state_dict_for_qunantied_model(model, state_dict)
  File "/root/.cache/huggingface/modules/transformers_modules/output_models/quantizer.py", line 257, in load_state_dict_for_qunantied_model
    set_value(model, name, state_dict, is_4bit)
  File "/root/.cache/huggingface/modules/transformers_modules/output_models/quantizer.py", line 212, in set_value
    setattr(parent, keys[-1], state_dict[name])
  File "/root/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1715, in __setattr__
    raise TypeError(f"cannot assign '{torch.typename(value)}' as parameter '{name}' "
TypeError: cannot assign 'torch.BFloat16Tensor' as parameter 'weight' (torch.nn.Parameter or None expected)

我把test.py中指定的模型修改为CodeShell-7B-Chat-int后再运行就没问题：

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

device = 'cuda' if torch.cuda.is_available() else 'cpu'
tokenizer = AutoTokenizer.from_pretrained("/opt/ai/WisdomShell/CodeShell-7B-Chat-int4/WisdomShell/CodeShell-7B-Chat-int4/", local_files_only=True)
model = AutoModelForCausalLM.from_pretrained("/opt/ai/WisdomShell/CodeShell-7B-Chat-int4/WisdomShell/CodeShell-7B-Chat-int4/", trust_remote_code=True, local_files_only=True).to(device)

history = []
query = '你能为我推荐一些书吗？'
response = model.chat(query, history, tokenizer)
print(response)
history.append((query, response))

query = '我想了解电动汽车和燃油汽车的区别'
response = model.chat(query, history, tokenizer)
print(response)
history.append((query, response))

请问哪里出问题了？我的CUDA版本是11.8，torch版本是2.1.0，TensorFlow版本是2.13.0，python版本是3.9，我的系统是centos7 x86_64，内核版本是 3.10 请大神帮帮忙，感激不尽！

WisdomShell / codeshell

运行微调后的CodeShell-7B-Chat-int4模型失败：TypeError: cannot assign 'torch.BFloat16Tensor' as parameter 'weight' (torch.nn.Parameter or None expected) #47