Open szbcasia opened 6 months ago
Hello, I encountered the following problems during the second phase of training:
File "/home/AI_project/LLaMA-VID/llamavid/model/language_model/llava_llama_vid.py", line 80, in forward input_ids, attention_mask, past_key_values, inputs_embeds, labels = self.prepare_inputs_labels_for_multimodal(input_ids, attention_mask, past_key_values, labels, images, prompts=prompts) File "/home/AI_project/LLaMA-VID/llamavid/model/llamavid_arch.py", line 532, in prepare_inputs_labels_for_multimodal image_features = self.encode_images(images, prompts, long_video=long_video) File "/home/AI_project/LLaMA-VID/llamavid/model/llamavid_arch.py", line 341, in encode_images image_features = self.vlm_attention(image_features, File "/home/AI_project/LLaMA-VID/llamavid/model/llamavid_arch.py", line 350, in vlm_attention assert len(image_features) == len( AssertionError: Size mismatch! image_features: 1, prompts: 8
Is there any solution?
Hello, I encountered the following problems during the second phase of training:
Is there any solution?