Yuxinn-J / Scenimefy

[ICCV 2023] Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation
https://yuxinn-j.github.io/projects/Scenimefy.html
Other
263 stars 17 forks source link

Can Not Find Pretrained CLIP Implementation #14

Open echelon2718 opened 4 days ago

echelon2718 commented 4 days ago

Hey there! Firstly, thank you for publishing this wonderful work. I am fascinated by your research and I'm trying to understand every component of Scenimefy. However, I could not find where you implement the pretrained CLIP's code for the PatchNCE loss (I read your paper, and you mentioned that you're using a pretrained CLIP for extracting the image features). I would greatly appreciate it if you want to elaborate more about this component based on your paper. Thank you very much!

Yuxinn-J commented 15 hours ago

Hi, thanks for saying that I'm glad you liked it. Regarding your question, yes, we use the pretrained CLIP model in the first stage to help preserve content while fine-tuning the StyleGAN to generate pseudo paired data. As for the PatchNCE loss, we actually extract features directly from the GAN generator itself, rather than using CLIP. You can find the relevant code here.

Given the impressive generative capabilities of diffusion models nowadays, an alternative approach could be to use a diffusion model to generate pseudo paired data in the first stage instead of fine-tuning StyleGAN. This might yield better results for supervising GAN training.