baaivision / Emu3

Next-Token Prediction is All You Need
Apache License 2.0
1.81k stars 71 forks source link

multi images input or multi images generation? #35

Open FanqingM opened 3 weeks ago

FanqingM commented 3 weeks ago

Hello, Great work. Does emu3 can understand multiple images (interleaved image and text) or generation multiple images((interleaved image and text) ? If we can conduct interleaved images and text generation / understanding SFT on emu3?

ryanzhangfan commented 3 weeks ago

We did not intentionally construct interleaved data during the training of the Emu3 series models. So our released post training model can not do interleaved understanding or generation well. You can try interleaved SFT on Emu3 by yourself.