baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI
https://baaivision.github.io/emu2/
Apache License 2.0
1.62k stars 85 forks source link

Generate from any prompt sequence #79

Closed lzw-lzw closed 8 months ago

lzw-lzw commented 8 months ago

Thank you for your excellent work. I notice that emu2 has the ability to generate images given text, image example and corresponding positions at the same time(such as the "Generate from any prompt sequence" part of Figure 1 in the paper), but the training data in the paper does not seem to have this type of data. I want to know if this ability of EMU2 emerges based on other types of data, or if I have overlooked some details in the paper. Thanks!

ryanzhangfan commented 8 months ago

There are mainly three parts in Figure 1(split by dashline boxes), which indicate the results of Emu2, Emu2-Chat and Emu2-Gen separately.

Emu2 abilities to generate image based on only text and image inputs while Emu2-Gen can generate from any prompt sequence. Please refer to Section 2.3.2 for more details of datasets used in training of Emu2-Gen.