danielroich / PTI

Official Implementation for "Pivotal Tuning for Latent-based editing of Real Images" (ACM TOG 2022) https://arxiv.org/abs/2106.05744
MIT License
897 stars 112 forks source link

Pivotal tuning for grey-scale (sketch) images #19

Closed hyungkwonko closed 3 years ago

hyungkwonko commented 3 years ago

Hi Daniel. Thank you for great work. I have trained my StyleGAN2-ada model using sketch data, which generates sketches quite well. After that, to manipulate real-images, I have tested PTI but the quality was not good. When I remove LPIPS loss (using only L2 loss) running the pivotal tuning, the reconstruction went well. However, the manipulation is still not working very well. Could you please provide any tips on this? Should I train LPIPS with different datasets? or any loss function you can recommend?

danielroich commented 3 years ago

hey @hyungkwonko, Thanks for the feedback!

I think the problem with the overall editing you are trying to achieve is in the editing phase. When you try to use GANs to edit images there are 3 main steps that must be done:

The first one, which you already did, is to train the GAN such that its latent space is semantically meaningful and has the disentanglement property. Most of the pre-trained StyleGANS have those properties.

Secondly, you must invert the image (let's say a new sketch from the internet) to a latent vector that resides in the latent space of the StyleGAN you have trained. You want the inversion to be as close to the original image as much as possible and that the latent vector resides in an editable place for future edits. Here PTI shines it enables you to get very good latents for the image you want to edit. Nevertheless, it does not give you editing capabilities on its own. To fix this comes the third step.

In the third step, you want to change the latent vector in such a way that it achieves semantically meaningful edits. In order to do so, many techniques were developed. The common ground of all those techniques is that they operate for a specific domain. Notice that all the examples I have uploaded are in the facial domain, and all the editing directions I have uploaded operate specifically for this domain. They will fail to edit other domains, such as cars, cats, and your new latent space.

I have tried successfully to use PTI on other domains such as animals. But in order for you to edit a latent code, you must first find the editing directions inside your new latent space. The ones I have uploaded from StyleCLIP, GANSpace, and InterfaceGAN won't work on your domain. For more information about how you can create/find editing directions please see #15.

Hope this clarifies your problem. Daniel

hyungkwonko commented 3 years ago

Hi Daniel,

Thank you for very kind and long answer. In my case, my training set is a anime face sketch dataset (similar to this but grayscale). I used SeFa to find the meaningful vectors, but it does not effectively work even for training datasets (not to mention OOD/real images). When I reconstruct the image after using StyleGAN2 inversion, the generation quality is bad (ID changes), but the manipulation quality is good. When I reconstruct the image after using PTI, because of the fine-tuning of the generator, the ID is sustained but the manipulation results (pose editing) are bad. I found that turning off LPIPS loss during pivotal tuning helped it to perform a little better, but the result is still not satisfactory, which is the reason why I have asked for your help.

I was wondering that if I only use L2 loss during the StyleGAN2 inversion (before the pivotal tuning), would it be able to find more meaning latent? (haven't tried yet) So maybe that could results in better latent vector (with great editability) once pivotal tuning is applied. Again, I really appreciate your sincere reply. Your work is really novel and I believe this work would be a great ground work for many future researches. Have a good week ahead.

Hyung-Kwon

rmokady commented 3 years ago

Hi Hyung-Kwon,

Can you share your example?

I would guess that the StyleGAN2 inversion might not working well enough, results with bad pivot which may cause the PTI to fail. Is it the case?

hyungkwonko commented 3 years ago

Hi rmokady,

This is StyleGAN2 inverted image + SeFa manipulation result:

This is PTI inverted image + SeFa manipulation result:

When I run PTI optimization for a long time or a higher learning rate than the default setting, the inverted image looks fine. But the manipulation result is still bad.

Thank you very much rmokady for your help. Hope you have a great day.

Hyung-Kwon

rmokady commented 3 years ago

Thanks for sharing.

Indeed your data is sparse which might be more challenging

Looking at the images I think the problem is that the StyleGAN2 inversion failed, which later fail the PTI. As far as I can see the StyleGAN2 inversion struggle to keep the same rotation angles.

I would try to find an inversion that succeed to maintain the same rotation angles and face dimensions. Maybe W+ inversion? or Encoder like e4e or ReStyle?

hyungkwonko commented 3 years ago

Thank you very much for prompt answer rmokady. I appreciate your idea. Maybe I should test different inversion methods with different losses.

BTW, one quick question I want to ask: Since I am kind of a newbie in generative models, is there any ground that sparse data is hard for manipulation in latent space? At first glance, I found it should work easier than human face, but turns out that what you have said is correct. Again, thank you for your help. I really appreciate it.

Hyung-Kwon

rmokady commented 3 years ago

I think sparse image may act different than a regular image. For example, LPIPS may present different behavior. Also, it is more likely that inversion collapse into an average or something like this.

BTW, stylegan perform better on aligned data, did you managed to align those sketches?

hyungkwonko commented 3 years ago

Thank you for your answer rmokady. Definitely, manipulating LPIPS is required for the sketch data, and yes I aligned the images for training. It seems like I should do more research with diverse experiments on this issue by myself. Will post if I find something new.

Thanks again @rmokady and @danielroich for the help. All the best, Hyung-Kwon