OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory

bhavyajoshi-mahindra commented 6 days ago

How do you load and infer a custom GPTQ quantized Qwen2-VL model (not the default one) using Qwen2VLForConditionalGeneration in WINDOWS

I used the following code.

from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info
import torch
import time
# We recommend enabling flash_attention_2 for better acceleration and memory saving, especially in multi-image and video scenarios.
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"
print(DEVICE)
model_path = r"D:\amar\qwen2-vl-2\models\vinplate2-3000-4bit"
model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16, device_map= DEVICE, attn_implementation="flash_attention_2")
model.to(DEVICE)
# default: Load the model on the available device(s)
# model = Qwen2VLForConditionalGeneration.from_pretrained(
#     "Qwen/Qwen2-VL-7B-Instruct-GPTQ-Int4", torch_dtype="auto", device_map="auto"
# )

# The default range for the number of visual tokens per image in the model is 4-16384. You can set min_pixels and max_pixels according to your needs, such as a token count range of 256-1280, to balance speed and memory usage.
min_pixels = 256 * 28 * 28
max_pixels = 1280 * 28 * 28
processor = AutoProcessor.from_pretrained(
    model_path, min_pixels=min_pixels, max_pixels=max_pixels
)

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "[https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg%22),
            },
            {"type": "text", "text": "Describe this image."},
        ],
    }
]

# Preparation for inference
text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
).to(DEVICE)
t1 = time.time()
# Inference: Generation of the output
generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
t2 = time.time()
print('Time Taken')
print(t2-t1)
print(output_text)

When I tried, I got the following error :

Traceback (most recent call last):
  File "D:\amar\qwen2-vl-2\infer_quan.py", line 9, in <module>
    model = Qwen2VLForConditionalGeneration.from_pretrained(
  File "C:\Users\amarg\anaconda3\envs\fresh\lib\site-packages\transformers\modeling_utils.py", line 3763, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory D:\amar\qwen2-vl-2\models\vinplate2-3000-4bit.

This are the files that are generated during the quantization

Here is my environment

tokenizers==0.20.3 torch==2.4.1+cu121 torchaudio==2.4.1+cu121 torchvision==0.19.1+cu121 transformers==4.46.2 accelerate==1.1.1 auto_gptq==0.7.1 CUDA == 12.1

WYHZQ commented 3 days ago

我也遇到了这个问题，因为只有一个权重，所以没有model.safetensors.index.json文件。我自己写了一个model.safetensors.index.json文件问题得到解决。

bhavyajoshi-mahindra commented 2 days ago

我也遇到了这个问题，因为只有一个权重，所以没有model.safetensors.index.json文件。我自己写了一个model.safetensors.index.json文件问题得到解决。

How exactly did you manage to create your own model.safetensors.index.json file, can you share in detail so that I could create my own file.

QwenLM / Qwen2-VL

OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory #520