LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.83k stars 229 forks source link

Bugs in the inference phase #176

Closed josephzpng closed 2 months ago

josephzpng commented 2 months ago

Model name: LLaVA-NeXT-Video-7B

llava/model/llava_arch.py", line 309, in prepare_inputs_labels_for_multimodal

image_feature = unpad_image(image_feature, image_sizes[image_idx])

TypeError: 'NoneType' object is not subscriptable

josephzpng commented 2 months ago

When I modified: output_ids = model.generate(inputs=input_ids, images=video, attention_mask=attention_masks, modalities="video", do_sample=False, temperature=0.0, max_new_tokens=1024, top_p=0.1,num_beams=1,use_cache=True, stopping_criteria=[stopping_criteria])

-->

output_ids = model.generate(inputs=input_ids, images=video, attention_mask=attention_masks, modalities=["video"], do_sample=False, temperature=0.0, max_new_tokens=1024, top_p=0.1,num_beams=1,use_cache=True, stopping_criteria=[stopping_criteria])

When I modified xx, I solved the problem.