Closed hayakae closed 5 months ago
Yes, this is how we implement textual inversion. Of course, huggingface's pipeline now has a more convenient method (pipeline.load_textual_inversion), but when we wrote this code, there was no such method. So we tried to use emotional embedding by replacing the input ones.
I was surprised that without changing Stable Diffusion and CLIP architecture, just training MLP can reflect emotions and semantics. That's because of the scale of EmoSet, indeed. Thank you for your prompt reply! It helped me a lot.
Hello. Thank you for your wonderful work!
I have a question about generating emotional images.
As shown in training/inference.py and supplementary material, you replace EOS embedding with embedding mapped from emotion space.
I’m not familiar with Stable Diffusion, is this approach a type of textual inversion? Why did you choose this approach?
Thanks.
https://github.com/JingyuanYY/EmoGen/blob/f560012bf56ff68f5c6edc3dfb9728e9c856ad91/training/inference.py#L127-L131