High resolution images - Githubissues

vladradishevsky commented 3 years ago

Hello!

Have you tested the model with high resolution images (512, 1024, ...)? Pictures are also drawn with high quality?

What parameters do you recommend to change to train 512x512 and 1024x1024 images?

JunlinHan commented 3 years ago

Hi! DCLGAN is quite memory sensitive, it needs to support two generators/discriminators at a single GPU. If you have a GPU with 16G memory, you can support the training in 512^2 res, but may not support 1024^2. In test time, you can train your model on 256 res, but test it with 512^2/512 x 1024/1024^2, it would be quite flexible during testing. You may try this setting to train 1024 res translation, and you probably needs to train it for a longer time than usual: --load_size 800 --crop_size 512 And this for 512 res： --load_size 572 --crop_size 512

You may also check those papers below, they are designed for high-res translation: High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network Pix2pixHD

itsMorteza commented 3 years ago

Hi, why do bigger patches needed for 512*512 res images? what is the best iteration per image ratio what I mean is how many epochs needed for large datasets such as 60k images? thanks.

JunlinHan commented 3 years ago

Hi, why do bigger patches needed for 512*512 res images? what is the best iteration per image ratio what I mean is how many epochs needed for large datasets such as 60k images? thanks.

Hi, Usually, training with higher resolution results better performance in testing. I tried to use 1680*800 resolution images before, training with 512^2 patches gives a much better result than 256^2. 1 million iterations is enough for any kind of one-to-one image translation model. You may train it for 20 epochs. Use setting --n_epochs 10 --n_epochs_decay 10 should work well.

zrt791521360 commented 3 years ago

Hi，thank you for your excellent work！

If the width and height of the image I want are different, how should I train? Do I need to change the network structure, or do I just need to modify the parameters to read the image?

JunlinHan commented 3 years ago

Hi，thank you for your excellent work！

If the width and height of the image I want are different, how should I train? Do I need to change the network structure, or do I just need to modify the parameters to read the image?

Hi! You may try these two choices. 1: Just crop it into square patches for training. You can test your images with original resolution as the network is fully convolutiona ( but width and height must be divisible by 4). 2: Set -preprocess none during trainint/testing. But you need to test it a little bit to see if this setting fits your GPU memory.

itsMorteza commented 3 years ago

Hi, thanks for your reply on the front of crop size and load size. How does the network react in changing generator power? when we face an application with a large domain gap do you think which one is more effective:

Using more blocks in the generator
Increasing the number of filters in the generator I tried weaker discriminator and it worked out well however the changes weren't dramatic. Do you have any suggestions on changing the training procudure? Thanks

JunlinHan commented 3 years ago

Hi, thanks for your reply on the front of crop size and load size. How does the network react in changing generator power? when we face an application with a large domain gap do you think which one is more effective:
* Using more blocks in the generator

* Increasing the number of filters in the generator
  I tried weaker discriminator and it worked out well however the changes weren't dramatic. Do you have any suggestions on changing the training procudure?
  Thanks

Hi, 1: Using more blocks in the generator. It may not be very helpful. Sometimes 6 blocks even give better results than 9. 2: Increasing the number of filters in the generator Increasing channels may also not be very helpful too. 3: weaker discriminator. Ye, this should be somehow helpful. But it's usual that the changes were not very obvious.

I would think about changing the architecture of the generator or go to paired setting. If the domain gap is very huge, a supervised( paired) setting might be more efficient, if paired data is available. Also, the architecture of this paper, as in some recent papers, is resnet-9, which was proposed in 2016. More recent papers are trying to use stylegan-based generator and transformer-based generator. These new generators might provide a better ability. Cheers

itsMorteza commented 3 years ago

In respect of generator arch, you mean using more blocks (such as 12 - 18 blocks) or even more filters(NGF = 80 or 128) in Resnet doesn't shrink the chance of over-feating or saturation on the Color domain. (instead of changing texture and more visible transformation.) I get your recommendation on according to adaptive discriminator or Stylegan in generator however their changes damage the semantic consistency where the Cyclegan based models don't make.

JunlinHan commented 3 years ago

In respect of generator arch, you mean using more blocks (such as 12 - 18 blocks) or even more filters(NGF = 80 or 128) in Resnet doesn't shrink the chance of over-feating or saturation on the Color domain. (instead of changing texture and more visible transformation.) I get your recommendation on according to adaptive discriminator or Stylegan in generator however their changes damage the semantic consistency where the Cyclegan based models don't make.

Hi, Yes, it may not be very effective by simply adding more blocks or filters. The improvement can be very limited. Also, adding more filters can slow up the training a lot. You may try adding more blocks, since it may not impact the training speed a lot. If you are doing something like semantic segmentation, which means you need low-level features to do deconvolution, you may also consider adding some skip-connections or employing a multi-head/branch, then do a fusion, to capture both low-level feature and high-level feature for upsampling.

For arch, ok, if stylegan-based models are not good. How about the arch of SPADE[1], or even implicit neural representation[2]?

[1] https://github.com/NVlabs/SPADE [2] Anycost GANs for Interactive Image Synthesis and Editing

JunlinHan / DCLGAN

High resolution images #2