Failure in loading QUANT4 model

System Info / 系統信息

Cuda==12.4, Transformers==4.32.0, torch==2.3.0, xformers==0.0.26.post1, triton==2.3.0 Device = 3090/4090

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

import` os
## visable gpu
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

import torch
from PIL import Image
import os
import json
import pickle
from tqdm import tqdm

from modelscope import AutoModelForCausalLM, AutoTokenizer
# from transformers import AutoModelForCausalLM, AutoTokenizer
# from accelerate import init_empty_weights, load_checkpoint_and_dispatch, infer_auto_device_map

MODEL_PATH = "/data3/lisibo/.cache/modelscope/hub/ZhipuAI/cogvlm2-llama3-chinese-chat-19B-int4"
# MODEL_PATH= "ZhipuAI/cogvlm2-llama3-chinese-chat-19B-int4"
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'
TORCH_TYPE = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.get_device_capability()[
    0] >= 8 else torch.float16
print("TORCH_TYPE:", TORCH_TYPE)

tokenizer = AutoTokenizer.from_pretrained(
    MODEL_PATH,
    trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_PATH,
    torch_dtype=TORCH_TYPE,
    trust_remote_code=True,
    low_cpu_mem_usage=True,
).eval()

The codes above is the same as the int4 model card on huggingface. I am using this code to directly load the 4-bit checkpoint, and in my expectation, that does not need quantizing while loading the model. So It should be faster. However, it seems that errors occur when loading the model. Logs are in the following.

2024-09-03 20:35:38,687 - modelscope - INFO - PyTorch version 2.3.0 Found.
2024-09-03 20:35:38,689 - modelscope - INFO - Loading ast index from /data3/lisibo/.cache/modelscope/ast_indexer
2024-09-03 20:35:38,725 - modelscope - INFO - Loading done! Current index file version is 1.14.0, with md5 3753725eddbea1b58b893b7ccc61de0b and a total number of 976 components indexed
/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/transformers/utils/generic.py:260: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
TORCH_TYPE: torch.bfloat16
Traceback (most recent call last):
  File "/data3/lisibo/euluc/CogVLM2/basic_demo/cli_demo_3.py", line 27, in <module>
    model = AutoModelForCausalLM.from_pretrained(
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 113, in from_pretrained
    module_obj = module_class.from_pretrained(model_dir, *model_args,
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 511, in from_pretrained
    return model_class.from_pretrained(
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/modelscope/utils/hf_util.py", line 76, in from_pretrained
    return ori_from_pretrained(cls, model_dir, *model_args, **kwargs)
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3091, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3471, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 744, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 116, in set_module_quantized_tensor_to_device
    new_value = nn.Parameter(new_value, requires_grad=old_value.requires_grad)
  File "/data3/lisibo/.conda/envs/py310/lib/python3.10/site-packages/torch/nn/parameter.py", line 40, in __new__
    return torch.Tensor._make_subclass(cls, data, requires_grad)
RuntimeError: Only Tensors of floating point and complex dtype can require gradients

Expected behavior / 期待表现

Actually I am using the same code in May, and it worked. However, when I need to restart the project recently, it failed to load the model. Enviornment seems not be modified after 2024.6.9

THUDM / CogVLM2