Llama-3-VILA1.5-8B Inference error #39

Open joebradly opened 1 month ago

joebradly commented 1 month ago

Hello! Thanks for sharing such a nice project. I have set up environment following the instructions in ReadME. When I run the inference example as the following ( i have copy the run_vila.py file from llava/eval/ to the current project root): '''bash python run_vila.py \ --model-path Efficient-Large_model/Llama-3-VILA1.5-8B \ --conv-mode vicuna_v1 \ --query "\n Please describe the traffic condition." \ --image-file "./demo_images/av.png" ''' I encounter the following error: ''' ['./demo_images/av.png']

Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s] Loading checkpoint shards: 25%|██▌ | 1/4 [01:46<05:18, 106.09s/it] Loading checkpoint shards: 50%|█████ | 2/4 [03:47<03:49, 114.88s/it] Loading checkpoint shards: 75%|███████▌ | 3/4 [05:02<01:37, 97.03s/it] Loading checkpoint shards: 100%|██████████| 4/4 [05:13<00:00, 62.85s/it] Loading checkpoint shards: 100%|██████████| 4/4 [05:13<00:00, 78.34s/it] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results. Setting pad_token_id to eos_token_id:128001 for open-end generation. input: \n Please describe the traffic condition. [WARNING] the auto inferred conversation mode is llava_v0, while --conv-mode is vicuna_v1, using vicuna_v1 torch.Size([1, 3, 384, 384]) Traceback (most recent call last): File "/home/deping.zhang/code/llm/VILA/run_vila.py", line 153, in eval_model(args) File "/home/deping.zhang/code/llm/VILA/run_vila.py", line 115, in eval_model output_ids = model.generate( File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "/home/deping.zhang/code/llm/VILA/llava/model/language_model/llava_llama.py", line 171, in generate outputs = self.llm.generate( File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, **kwargs) File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/utils.py", line 1764, in generate return self.sample( File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/utils.py", line 2924, in sample if stopping_criteria(input_ids, scores): File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 132, in call return any(criteria(input_ids, scores) for criteria in self) File "/home/deping.zhang/.conda/envs/vila/lib/python3.10/site-packages/transformers/generation/stopping_criteria.py", line 132, in return any(criteria(input_ids, scores) for criteria in self) File "/home/deping.zhang/code/llm/VILA/llava/mm_utils.py", line 287, in call outputs.append(self.call_for_batch(output_ids[i].unsqueeze(0), scores)) File "/home/deping.zhang/code/llm/VILA/llava/mm_utils.py", line 272, in call_for_batch if (output_ids[0, -keyword_id.shape[0] :] == keyword_id).all(): RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 0 '''

Lyken17 commented 1 month ago

could @joebradly @seancraven314 share your environemnt? The code runs without error on myside.

SeanCraven314 commented 1 month ago

Hi this is a dump of my environment,

I am launching from the cli:

python llava/eval/run_vila.py \
  --model-path=Efficient-Large-Model/Llama-3-VILA1.5-8B \
  --query "What is this?"

Pip list

Running on intel and A100 on Ubuntu 22.04

joebradly commented 1 month ago

could @joebradly @SeanCraven314 share your environemnt? The code runs without error on myside.

Package Version Editable project location

vila 1.0.0 /home/deping.zhang/code/llm/VILA

joebradly commented 1 month ago

I change line 272 to the following: if (output_ids[0, -keyword_id.shape[0] :, None] == keyword_id).all(): return True Then the inference runs through.

gaodianzhuo commented 1 month ago


Efficient-Large-Language-Model commented 1 month ago

Will verify and fix.

BTW, you need to use --conv-mode=llama_3 w/ llama3 model.

Efficient-Large-Language-Model commented 1 month ago

It seems when using the correct conv mode, there is no issue. Therefore, no code change is needed.

SeanCraven314 commented 1 month ago

Thanks very much for this. Sorry for the hassle.

hkunzhe commented 1 month ago

It seems when using the correct conv mode, there is no issue. Therefore, no code change is needed.

Hi, run_vila.py with VILA1.5-40B (not llama-3) will encounter the same issue. Use the workaround from @joebradly will fix it.

Efficient-Large-Language-Model commented 1 month ago

For VILA1.5-40B, you should use --conv-mode hermes-2

tp-nan commented 1 month ago

hi, for the new version, python3 -W ignore llava/eval/run_vila.py --model-path Efficient-Large-Model/VILA1.5-3B \ --conv-mode vicuna_v1 --query "<image>\n Please describe the traffic condition." \ --image-file "demo_images/av.png" gives ValueError: Keyword tensor should have 2 or 3 dimensions, got 1

How can I fix it?