Running the AWQ models - Githubissues

signine commented 2 months ago

Is it possible to run the AWQ models using the run_vila.py script?

I ran the following command:

python -W ignore llava/eval/run_vila.py     \
  --model-path Efficient-Large-Model/VILA1.5-3b-AWQ \      
  --conv-mode vicuna_v1  \   
  --query "<video>\n Describe this video"   \  
  --video-file "tjx1PPFsa6A-Scene-049.mp4"

and got this error:

Traceback (most recent call last):
  File "/src/VILA/llava/eval/run_vila.py", line 160, in <module>
    eval_model(args)
  File "/src/VILA/llava/eval/run_vila.py", line 64, in eval_model
    tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, model_name, args.model_base)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/builder.py", line 177, in load_pretrained_model
    model = LlavaLlamaModel(
            ^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/language_model/llava_llama.py", line 53, in __init__
    return self.init_vlm(config=config, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/llava_arch.py", line 76, in init_vlm
    self.llm, self.tokenizer = build_llm_and_tokenizer(llm_cfg, config, *args, **kwargs)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/src/VILA/llava/model/language_model/builder.py", line 77, in build_llm_and_tokenizer
    llm = AutoModelForCausalLM.from_pretrained(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
    ) = cls._load_pretrained_model(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.pyenv/versions/3.11.8/lib/python3.11/site-packages/transformers/modeling_utils.py", line 503, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: No such file or directory: "/root/.cache/huggingface/hub/models--Efficient-Large-Model--VILA1.5-3b-AWQ/snapshots/5d37764f2ed919bae08637a3b380bfd53931475d/llm/model-00001-of-00002.safetensors"

How can I run inference with the checkpoint in here? https://huggingface.co/Efficient-Large-Model/VILA1.5-3b-AWQ/tree/main/llm

ys-2020 commented 2 months ago

Hi @signine , thanks for your interest in VILA and AWQ. The newest quantized VILA-1.5 models are supported with TinyChat. Please refer to this page for instructions. You may find the usage section and VLM section helpful. Thank you!

hkunzhe commented 2 months ago

@ys-2020, vlm_demo_new.py in tinychat does not support video files.

hkunzhe commented 1 month ago

@ys-2020, By referring to https://github.com/Efficient-Large-Model/VILA/blob/main/llava/eval/run_vila.py and https://github.com/mit-han-lab/llm-awq/blob/main/tinychat/serve/gradio_web_server.py, I successfully ran the inference with the video input in https://github.com/mit-han-lab/llm-awq/blob/main/tinychat/vlm_demo_new.py, but the output was not as effective as the online demo with the same parameters (VILA1.5-40b-AWQ; temperature 1.0; top-p 1.0; num_frames 8). Do you have any ideas?

NVlabs / VILA

Running the AWQ models #51