Open LanceLeonhart opened 2 months ago
Here is the whole error message:
"[2024-08-25 16:28:12,830] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.48s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
input: --conv-mode
is llama_3, using llama_3
torch.Size([1, 3, 384, 384])
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask
to obtain reliable results.
Setting pad_token_id
to eos_token_id
:128001 for open-end generation.
Traceback (most recent call last):
File "/home/ny52/VILA/llava/eval/run_vila.py", line 157, in
same
Pretty sure this broke in 54c970676ce96a5f7df372b5d25b2b98057716c2, as there the LlamaRotaryEmbedding.forward(self, seq_len)
arg was removed.
Same here, previously I was able to run inference on V100 with no problem, but now it is broken
Hello, I am trying to run the VILA model for inference, but I have encountered a couple of issues that I need help with. (1)FlashAttention Issue: Initially, I faced a problem related to FlashAttention. After going through all relevant issues on Github, I managed to resolve this issue by modifying the relevant code (lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py ). (2)TypeError Encountered: After addressing the FlashAttention issue, I encountered the following error during model inference: TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' Could you please provide guidance on how to resolve this issue? Any help would be greatly appreciated!