Efficient-Large-Model / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Apache License 2.0
878 stars 55 forks source link

Running the AWQ models #51

Open signine opened 1 month ago

signine commented 1 month ago

Is it possible to run the AWQ models using the run_vila.py script?

I ran the following command:

python -W ignore llava/eval/run_vila.py     \
  --model-path Efficient-Large-Model/VILA1.5-3b-AWQ \      
  --conv-mode vicuna_v1  \   
  --query "<video>\n Describe this video"   \  
  --video-file "tjx1PPFsa6A-Scene-049.mp4"

and got this error:

Traceback (most recent call last):
  File "/src/VILA/llava/eval/run_vila.py", line 160, in <module>
    eval_model(args)
  File "/src/VILA/llava/eval/run_vila.py", line 64, in eval_model
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, model_name, args.model_base)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/builder.py", line 177, in load_pretrained_model
    model = LlavaLlamaModel(
            ^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/language_model/llava_llama.py", line 53, in __init__
    return self.init_vlm(config=config, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/llava_arch.py", line 76, in init_vlm
    self.llm, self.tokenizer = build_llm_and_tokenizer(llm_cfg, config, *args, **kwargs)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/language_model/builder.py", line 77, in build_llm_and_tokenizer
    llm = AutoModelForCausalLM.from_pretrained(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/modeling_utils.py", line 503, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: No such file or directory: "/root/.cache/huggingface/hub/models--Efficient-Large-Model--VILA1.5-3b-AWQ/snapshots/5d37764f2ed919bae08637a3b380bfd53931475d/llm/model-00001-of-00002.safetensors"

How can I run inference with the checkpoint in here? https://huggingface.co/Efficient-Large-Model/VILA1.5-3b-AWQ/tree/main/llm

ys-2020 commented 1 month ago

Hi @signine , thanks for your interest in VILA and AWQ. The newest quantized VILA-1.5 models are supported with TinyChat. Please refer to this page for instructions. You may find the usage section and VLM section helpful. Thank you!

hkunzhe commented 1 month ago

@ys-2020, vlm_demo_new.py in tinychat does not support video files.

hkunzhe commented 1 month ago

@ys-2020, By referring to https://github.com/Efficient-Large-Model/VILA/blob/main/llava/eval/run_vila.py and https://github.com/mit-han-lab/llm-awq/blob/main/tinychat/serve/gradio_web_server.py, I successfully ran the inference with the video input in https://github.com/mit-han-lab/llm-awq/blob/main/tinychat/vlm_demo_new.py, but the output was not as effective as the online demo with the same parameters (VILA1.5-40b-AWQ; temperature 1.0; top-p 1.0; num_frames 8). Do you have any ideas?