JingyuanYY / EmoGen

This is the official implementation of 2024 CVPR paper "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models".
59 stars 6 forks source link

About inference #5

Closed hayakae closed 5 months ago

hayakae commented 6 months ago

Hello. Thank you for your wonderful work!

I have a question about generating emotional images.

As shown in training/inference.py and supplementary material, you replace EOS embedding with embedding mapped from emotion space.

I’m not familiar with Stable Diffusion, is this approach a type of textual inversion? Why did you choose this approach?

Thanks.

https://github.com/JingyuanYY/EmoGen/blob/f560012bf56ff68f5c6edc3dfb9728e9c856ad91/training/inference.py#L127-L131

fengjw0909 commented 6 months ago

Yes, this is how we implement textual inversion. Of course, huggingface's pipeline now has a more convenient method (pipeline.load_textual_inversion), but when we wrote this code, there was no such method. So we tried to use emotional embedding by replacing the input ones.

hayakae commented 6 months ago

I was surprised that without changing Stable Diffusion and CLIP architecture, just training MLP can reflect emotions and semantics. That's because of the scale of EmoSet, indeed. Thank you for your prompt reply! It helped me a lot.