rookiez7 commented 3 months ago

I redownload this repo,and tried transfoemers version:4.40.0.dev、4.40.0、4.41.2,the result is still ['']. some thing i do include: All weight i use is local weight.below is my change.

Meta-Llama-3-8B-Instruct:llava/conversation.py,line387, tokenizer=AutoTokenizer.from_pretrained("local_path/LLaVA-NeXT/Meta-Llama-3-8B-Instruct")
siglip-so400m-patch14-384:llava-onevision-qwen2-7b-si/config.json,line176, ision_tower": "local_path/siglip-so400m-patch14-384",then some error about mismatch,I use this to fix it.https://github.com/LLaVA-VL/LLaVA-NeXT/issues/148#issuecomment-2298549964

then 0.5b model work fine,7b model result is always [''],below result is 7b model :

(llava) root@sugon:~/work/project/LLaVA-NeXT# python demo_single_image.py 
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loaded LLaVA model: /root/work/project/LLaVA-NeXT_bak/llava-onevision-qwen2-7b-si
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
You are using a model of type llava to instantiate a model of type llava_qwen. This is not supported for all configurations of models and can yield
Loading vision tower: /root/work/project/LLaVA-NeXT/siglip-so400m-patch14-384
Loading checkpoint shards: 100%|___________________________________________________________________________________________________________________
Model Class: LlavaQwenForCausalLM
['']

I find 0.5b and 7b have diff result in below code cont = model.generate( input_ids, images=image_tensor, image_sizes=image_sizes, do_sample=False, temperature=0, max_new_tokens=4096, ) 0.5 is cont:tensor([[ 785, 2168, 374, 264, 27508, 9487, 429, 4933, 279, 5068, 315, 2155, 25185, 476, 4119, 304, 264, 3151, 7947, 11, 1741, 438, 5662, 6832, 476, 5810, 4128, 8692, 13, 576, 9487, 702, 3807, 24745, 323, 9201, 18860, 279, 12111, 5036, 11, 862, 19511, 12205, 11, 323, 10767, 1008, 16734, 1075, 330, 9389, 3298, 12, 17, 1335, 330, 16664, 9389, 3298, 1335, 330, 48, 16948, 19625, 43, 12, 15672, 1335, 323, 330, 4086, 64, 12820, 12, 16, 13, 20, 1189, 8886, 12111, 594, 5456, 374, 15251, 553, 264, 1894, 86311, 1555, 389, 279, 9487, 11, 448, 6303, 14064, 14850, 3298, 12, 17, 11, 6176, 369, 29051, 9389, 3298, 11, 18575, 369, 1207, 16948, 19625, 43, 12, 15672, 11, 323, 2518, 369, 19504, 64, 12820, 12, 16, 13, 20, 13, 576, 9487, 7952, 311, 387, 26297, 279, 5068, 315, 1493, 25185, 3941, 5257, 30476, 476, 9079, 11, 892, 1410, 387, 5435, 311, 23850, 11, 58354, 11, 476, 1045, 1352, 315, 821, 6358, 13, 151645]],

7b is cont:tensor([[151645]], device='cuda:0')

how can i fix it.please give me some advice

Stefan-084 commented 3 months ago

for question 1, actually, you can modifie it like this: (1) tokenizer=None and: (2) if self.tokenizer is None: self.tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_id)

print(chat_template_messages)

return self.tokenizer.apply_chat_template(chat_template_messages, tokenize=False, add_generation_prompt=True)

kristen-ding commented 2 months ago

The same issue. 7b model result is [!!.....] many many !!!

rookiez7 commented 2 months ago

I found my problem because the weight file I downloaded was broken, and I fixed it by re-downloading the full weight file

LLaVA-VL / LLaVA-NeXT

0.5b model work fine,7b model result is `['']` #186

print(chat_template_messages)