eladrich / pixel2style2pixel

Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
https://eladrich.github.io/pixel2style2pixel/
MIT License
3.19k stars 570 forks source link

Low resolution & Checkerboard effect in model's output #73

Closed rafaelbou closed 3 years ago

rafaelbou commented 3 years ago

Hi @yuval-alaluf Thanks for sharing (and maintaining) your great work!

After training the model on my own dataset, I came across a situation where the output of the model has unwanted characteristics:

  1. Low resolution - the product comes out relatively blurry.
  2. Checkerboard effect - the images contain an artificial grid (each cube is 16X16 pixels). This phenomenon started only at a certain stage during training (after about 10 K steps).

For clarification, after training the stylegan2 model (on which this training is based), the results come out sharp and without the checkerboard effect described above.

Have you experienced any of these things during training? Do you have any idea about the factors that could affect the output in this way? Maybe playing with the weights of the loss function?

More information Training opts:

"batch_size": 4, "board_interval": 50, "checkpoint_path": null, "dataset_type": "sd14_encode", "encoder_type": "GradualStyleEncoder", "exp_dir": "./models/psp_sd14_256", "generator_image_size": 256, "id_lambda": 0.0, "image_interval": 100, "input_nc": 3, "l2_lambda": 1.0, "l2_lambda_crop": 0, "label_nc": 0, "learn_in_w": false, "learning_rate": 0.0001, "lpips_lambda": 0.8, "lpips_lambda_crop": 0, "max_steps": 500000, "optim_name": "ranger", "resize_factors": null, "save_interval": 5000, "start_from_latent_avg": false, "style_count": 14, -> I add this parameter. I use it for setting self.style_count = opts.style_count "stylegan_weights": "./stylegan2-pytorch/checkpoint/310000.pt", "test_batch_size": 4, "test_workers": 8, "train_decoder": false, "val_interval": 2500, "w_norm_lambda": 0, "workers": 8

Training metrics: image

Test metrics: image

yuval-alaluf commented 3 years ago

Hi @rafaelbou , I'm happy to hear you've enjoyed our work! What you came across is interesting as we didn't come across something like this during our training.
I see that you've used your own StyleGAN with a resolution 256x256, but you said the outputs of your StyleGAN are sharp so let's assume its not because of the lower resolution StyleGAN. Other than that, I can see a few potential causes for the effect your seeing:

  1. You set start_from_latent_avg to False. We found that starting from the average latent code results in better initialization and better convergence, so try setting this flag to True and see if it helps.
  2. I see that you set the id_lambda to 0. If you're working on face images, we found that this loss is very significant for getting high quality inversion results.

If you wish, feel free to send an example of the artifacts you're seeing and maybe this can help to better understand the problem.

rafaelbou commented 3 years ago

Thanks for your answer.

As for the resolution of Stylegan2, you are right, my wording was not clear enough. The images obtained from the PSP come out blurry in relation to the output of the Stylagan2 only. This is reflected in the absence of details characterized by high frequency.

Set start_from_latent_avg to True When I set start_from_latent_avg to True, I get the error: File "/pixel2style2pixel/models/psp.py", line 77, in forward codes = codes + self.latent_avg.repeat(codes.shape[0], 1, 1) AttributeError: 'NoneType' object has no attribute 'repeat'

The source of the error is in the function __load_latent_avg: def __load_latent_avg(self, ckpt, repeat=None): if 'latent_avg' in ckpt: self.latent_avg = ckpt['latent_avg'].to(self.opts.device) if repeat is not None: self.latent_avg = self.latent_avg.repeat(repeat, 1) else: self.latent_avg = None The ckpt that I use for stylegan2 generator don't includes latent_avg parameter, so the function set self.latent_avg = None.

id loss I'm working on the fingerprint domain so this loss function does not apply to my case. I will try to track your motivation in the paper and produce such a function for my domain.

Output examples pSp output images (left-original input, right-pSp output - 115K steps): image

Stylegan2 output: image

Thanks again, Rafael.

yuval-alaluf commented 3 years ago

Interesting! Thanks for the clarifications. Indeed, the error with the average latent code occurs since its not saved in your StyleGAN generator. However, I don't think the average latent code is very meaningful in your domain. If seems that pSp is able to capture most of details of the input image, but struggles when it comes to preserving the fine details.

I will try to track your motivation in the paper and produce such a function for my domain.

I believe that incorporating a loss function that explicitly handles the preservation of fine details will provide the most improvement in your case.

rafaelbou commented 3 years ago

Do you have any idea why the Checkerboard effect was created? Especially due to the fact that the Stylegan2 output has no such phenomenon.

yuval-alaluf commented 3 years ago

Good question. I'm not sure what exactly could be causing this, but if I had to guess it is because the coarse feature maps outputted by our network are of size 16x16. These feature maps are then passed to the map2style block which down-samples each to a vector of size 512.
I am not sure exactly why the checkboard effect would be caused by this since we did not see anything like this in any of experiments, but I hope that a loss that address the fine details can also address this.