Open HongyanZhi opened 1 year ago
Hello. Thanks for your interest in our work. For each training example, we generate the embeddings only once. Note that for text loss we also first generate the embeddings, then compute the classification (Cross-Entropy) loss. Image loss is computed at the same place, but using the regression instead of the classification objective.
Thanks for your reply!
I have another 2 questions:
Many thanks for your reply!
Thanks for your great work first! I found that the code uses "emu_encoder.decoder.lm.generate()" to produce text response and uses "emu_encoder.decoder.lm.model()" to produce latent image embeddings. So how can I output both the text and image embedding to reproduce your training process? Or training is the first to use "emu_encoder. decoder. Lm. generate ()" to generate the text and then using "emu_encoder.decoder. Lm. model ()" to generate the image embedding? Thanks for you reply!