Evaluation of AWQ models

surya00060 commented 1 week ago

When I try to evaluate the quantized AWQ models using the video evalaution script, I'm getting FileNotFoundError.

FileNotFoundError: No such file or directory: "/hfhub/hub/models--Efficient-Large-Model--VILA1.5-3b-AWQ/snapshots/f18f59ccac0b45f92e70a490e6f88ab5ebadef23/llm/model-00001-of-00002.safetensors"

Is there any other way to run AWQ models? To get accuracy numbers.

Lyken17 commented 3 days ago

The checkpoint path was not set properlly, either point to a wrong path or not downloaded. Please attach more details

surya00060 commented 3 days ago

Thanks for replying back.

From the instruction found here The following command works. It downloads the model from huggingface and starts the video evaluation run (inference).

./scripts/v1_5/eval/video_chatgpt/run_all.sh Efficient-Large-Model/VILA1.5-3b VILA1.5-3b vicuna_v1

While I change the model to AWQ,

./scripts/v1_5/eval/video_chatgpt/run_all.sh Efficient-Large-Model/VILA1.5-3b-AWQ VILA1.5-3b-AWQ vicuna_v1

I get the following error

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]
Fetching 16 files: 100%|██████████| 16/16 [00:00<00:00, 12082.98it/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/depot/araghu/data/selvams/VILA/llava/eval/model_vqa_video.py", line 213, in <module>
    eval_model(args)
  File "/depot/araghu/data/selvams/VILA/llava/eval/model_vqa_video.py", line 126, in eval_model
    tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, model_name, args.model_base)
  File "/depot/araghu/data/selvams/VILA/llava/model/builder.py", line 151, in load_pretrained_model
    model = LlavaLlamaModel(config=config, low_cpu_mem_usage=True, **kwargs)
  File "/depot/araghu/data/selvams/VILA/llava/model/language_model/llava_llama.py", line 43, in __init__
    return self.init_vlm(config=config, *args, **kwargs)
  File "/depot/araghu/data/selvams/VILA/llava/model/llava_arch.py", line 76, in init_vlm
    self.llm, self.tokenizer = build_llm_and_tokenizer(llm_cfg, config, *args, **kwargs)
  File "/depot/araghu/data/selvams/VILA/llava/model/language_model/builder.py", line 71, in build_llm_and_tokenizer
    llm = AutoModelForCausalLM.from_pretrained(
  File "/depot/araghu/data/selvams/vila-env/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/depot/araghu/data/selvams/vila-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3697, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/depot/araghu/data/selvams/vila-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4080, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "/depot/araghu/data/selvams/vila-env/lib/python3.10/site-packages/transformers/modeling_utils.py", line 497, in load_state_dict
    with safe_open(checkpoint_file, framework="pt") as f:
FileNotFoundError: No such file or directory: "/scratch/gilbreth/selvams/hfhub/hub/models--Efficient-Large-Model--VILA1.5-3b-AWQ/snapshots/f18f59ccac0b45f92e70a490e6f88ab5ebadef23/llm/model-00001-of-00002.safetensors"
./scripts/v1_5/eval/video_chatgpt/run_qa_msrvtt.sh: line 44: runs/eval/VILA1.5-3b-AWQ/MSRVTT_Zero_Shot_QA/merge.jsonl: No such file or directory
./scripts/v1_5/eval/video_chatgpt/run_qa_msrvtt.sh: line 48: runs/eval/VILA1.5-3b-AWQ/MSRVTT_Zero_Shot_QA/merge.jsonl: No such file or directory

I observed the similar error with AWQ models when performing quick inference as well

Lyken17 commented 3 days ago

The support of AWQ is separated from the main repo, which means functions such as training and evaluation do not come with AWQ support -- you have to use BF16 precision

NVlabs / VILA

Evaluation of AWQ models #150