LLaVA-NeXT-Interleave Training Details

friedrichor commented 1 month ago

Hello. Thanks for your excellent work!

Earlier, I reproduced LLaVA-NeXT-Image training and got the desired performance, and I am now trying to reproduce LLaVA-NeXT-Interleave training. I would like to inquire about the details of LLaVA-NeXT-Interleave's training.

What are the values of image_aspect_ratio and mm_patch_merge_type? I notice that the config.json within lmms-lab/llava-next-interleave-qwen-7b has the setting

"image_aspect_ratio": "pad",
"mm_patch_merge_type": "flat",

Are the setting the same for training? Since the training data has some single-image data, I'm not sure whether single-image has to do AnyRes.

FengLi-ust commented 1 month ago

Hi, for image_aspect_ratio, you can set to anyres. For single-image data, we do anyres training. But do not use anyres for multi-image data to avoid excessive computation cost.

friedrichor commented 1 month ago

Thank you. I get it.

friedrichor commented 1 month ago

AnyRes provides sharing and flexible presentation between images, videos and multi-images, and I want to know exactly how to achieve it. Can you reveal some relevant codes?

LLaVA-VL / LLaVA-NeXT

LLaVA-NeXT-Interleave Training Details #103