Open waltonfuture opened 1 month ago
yes, the code supports multi-image finetuning
yes, the code supports multi-image finetuning
Thank you. How should I organize my data for multi-image sft? And how to inference with multi-image?
Same problem here. Any update on multi-image sft?
@qyc-98 Hello! Can you provide some simple examples of in-context inference or SFT? Thanks a lot!
@qyc-98 I have encountered the same problem. Have you resolved it
+1 also curious about this
Thanks for your great work! However, it seems that we can only use data that contains one image for SFT. Can we use in-context multimodal data (i.e., containing multiple images) for finetuning?