Train pSp: Anime to Face

eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework

https://eladrich.github.io/pixel2style2pixel/

MIT License

3.19k stars 570 forks source link

Train pSp: Anime to Face #132

Closed gudufengzhongyipilang closed 3 years ago

gudufengzhongyipilang commented 3 years ago

Thanks for the excellent work. Now I want to use pSp to train the Anime to Face like the Sketch to Face. I already have a dataset containing 140,000 anime faces, and I set the training parameters as follows: python scripts/train.py \ --dataset_type=celebs_sketch_to_face \ --exp_dir=/path/to/experiment \ --workers=8 \ --batch_size=8 \ --test_batch_size=8 \ --test_workers=8 \ --val_interval=2500 \ --save_interval=5000 \ --encoder_type=GradualStyleEncoder \ --start_from_latent_avg \ --lpips_lambda=0.8 \ --l2_lambda=1 \ --id_lambda=0 \ --w_norm_lambda=0.005 \ --label_nc=1 \ --input_nc=1 According to the Additional Notes in README, I added the file path corresponding to option --dataset_type=celebs_sketch_to_face. Can I achieve the desired effect with this setting, just like the Sketch to Face~

yuval-alaluf commented 3 years ago

Hi @gudufengzhongyipilang , Not quite. First, do you have 140,000 paired images of anime-real faces? The sketch-to-image task assumes you have paired data.
Second, you need to adjust the transforms to match your data. The transformers used in the sketch-to-image task assume that you have a single-channeled input, whereas I assume you have an RGB input. Therefore, you will likely need to adjust the transforms. I would say the transforms you are looking for is something like this: https://github.com/eladrich/pixel2style2pixel/blob/72f0c04a890adb9c8a406f879980572ac600cf29/configs/transforms_config.py#L22-L37

Going back to my question on paired data, if you do not have paired data, the parameters you have will mostly likely not work. Training with paired and unpaired data require different loss lambdas so you will need to play around with those a bit more if your data is unpaired.

gudufengzhongyipilang commented 3 years ago

Thanks reply @yuval-alaluf . My data is not paired. I have read your opinion carefully, and at the same time I refer to the Additional Applications located in README. Then I learned that the process of Toonify should be the training of unpaired data, but it requires a pre trained pSp encoder and a pre trained styleGAN generator. Which means I have to train an Anime-to-Face encoder with paired data before I train with unpaired data. Am I right? (╥﹏╥)

yuval-alaluf commented 3 years ago

For the toonify task, you do not need to train a separate pSp encoder. You do however need a StyleGAN generator in your domain. The good news is that your target domain, realistic faces, already has a pre-trained StyleGAN generator!
Therefore, you don't need to train any additional models before training your anime-to-face task.
I will say in advance that although we found pSp to work decently well on the toonify task, training in an unpaired setting is particularly challenging, so your anime-to-face task may not work out of the box. There are numerous other people who opened issues with similar questions and one issue which I think may be of use to you is the following: https://github.com/eladrich/pixel2style2pixel/issues/126 In the linked issue, you can see various steps you can take to try to improve pSp out-of-the-box in an unpaired setting.

Andre1998Shuvam commented 3 years ago

Hello @gudufengzhongyipilang ! Have you been able to make progress on this? If you have any leads on this, please tell. It will be very helpful!

gudufengzhongyipilang commented 3 years ago

Thanks for your advice @yuval-alaluf . Considering the difficulty and time consumption of the paired data training, I use the pre-trained model to try the Anime-To-Face task after reading the suggestions carefully. Here are some interesting results .

Anime-To-Face using psp_ffhq_encode
Anime-To-Grayscale-To-Face using psp_celebs_sketch_to_face
Anime-To-Face using psp_ffhq_toonify As you can see, the model psp_ffhq_toonify achieved the best results. So I used the model to generate more samples.
more samples generated with psp_ffhq_toonify The generated effect is worthy of recognition, but the styles of the generated samples are all similar. My guess is that the pSp encoder's ability was not fully utilized , after all, it is trained to encode real faces. Maybe I should try your further work --Restyle. ｂ(￣▽￣)ｄ（Admirably）

yuval-alaluf commented 3 years ago

It is not surprising that none of the pSp models worked for you. Unlike the toonification task (which does work in an unpaired setting), your input data is very different from anything any of the pSp models have seen before and therefore you get unexpected results. Solving this task with no paired data is not trivial and will require a bit of brainstorming beyond our paper's scope.