Closed lzw-lzw closed 10 months ago
There are mainly three parts in Figure 1(split by dashline boxes), which indicate the results of Emu2, Emu2-Chat and Emu2-Gen separately.
Emu2 abilities to generate image based on only text and image inputs while Emu2-Gen can generate from any prompt sequence. Please refer to Section 2.3.2 for more details of datasets used in training of Emu2-Gen.
Thank you for your excellent work. I notice that emu2 has the ability to generate images given text, image example and corresponding positions at the same time(such as the "Generate from any prompt sequence" part of Figure 1 in the paper), but the training data in the paper does not seem to have this type of data. I want to know if this ability of EMU2 emerges based on other types of data, or if I have overlooked some details in the paper. Thanks!