XPixelGroup / BasicSR

Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.
https://basicsr.readthedocs.io/en/latest/
Apache License 2.0
6.64k stars 1.16k forks source link

Why is the input data randomly scaled in SFTGAN? Is it just for data augmentation? #41

Closed ackbar03 closed 5 years ago

ackbar03 commented 5 years ago

Hi,

Just for clarification, why is the data randomly scaled in addition to the x4 upscale in SFTGAN? Is it just for purposes of data augmentation?

Also, is there a simple way of pre-generating the low-res images using mat-lab? The way the logic is implemented, (since it also randomly samples images from the DIV2k dataset), it seems impossible to simply put the LR images in a directory and complete the "dataroot_LR": data field in train_sftgan.json. Is that correct? I can understand ERGAN implementation but SFTGAN implementation is still a bit confusing to me.

Thanks

xinntao commented 5 years ago

Hi @ackbar03,

  1. Randomly scaling is for data augmentation.
  2. SFTGAN also uses the segmentation maps. Thus, when randomly crop patches, we also need to crop the segmentation maps if pre-generating low-resolutions. I choose to generate low-resolution images and segmentation maps on-the-fly.
ackbar03 commented 5 years ago

Hi @xinntao , thanks for the quick reply

Why is it during training we only use image crops with one category? Wouldn't it make more sense if we pass normal DIV2K images through the segmentation model and use that segmentation probability map to train the SFTGAN? I'm confused why the training is only done on a single category image like plants, building etc., the result would intuitively be undesirable for images with multiple categories (e.g. plant + building)

xinntao commented 5 years ago

Hi @ackbar03 I agree with you. What you said is a better way. There are two considerations when we use image cropped with one category: 1) For general DIV2K dataset, the segmentation results are not so satisfactory, due to complex scenes and objects. So we use a restricted training dataset - mainly focusing some common and simple outdoor scenes; 2) The labels in DIV2K dataset are unbalanced. For example, during training, the extracted patches from DIV2K are mostly sky, and some categories like the plant, mountain, are rarely to see for the model. Thus, the model performs not so well in these categories.

We have tried some experimets before and found the results are not so good (We have not fully explored it). Also for simplicity, we train the model use images cropped from one category. During testing, we found this strategy also works for images with multiple categories, and their boundaries. This is because the segmentation map is spatial and the feature modulation is conditioned locally.

I agree using general images with multiple categories will further improve the results. And using another discrininator predicts the segmentaion results rather than patch classification. It could be explored further.

ackbar03 commented 5 years ago

Hi @xinntao

So just to check my understanding is correct, the training process is conceptually:

1) We first train a normal SRGAN model with the images 2) Using the parameters of that model, we essentially fine tune the model to perform better super-resolution for different textures by training it using OST images with different category labels 3) Through this process, the end model still works for general images with multiple categories because "the segmentation map is spatial and the feature modulation is conditioned locally"

Thus the training step in BasicSR for SFTGAN
python train.py -opt options/train/train_sftgan.json

is essentially a fine-tuning step in the entire process.

Am I correct in my understanding that sft_net_ini.pth in Pretrained models is just a SFTGAN model architecture with weights taken from a pretrained SRGAN model? Theoretically we could also do a similar process with ESRGAN is that correct?

Thanks for your patience!

ackbar03 commented 5 years ago

and another add on question,

since the pretrained model SRGAN_bicx4_noBN_DIV2K.pth from transfer_params_sft.py is not actually provided, it seems we need to train our own SRGAN model before hand. Does it matter if we set

"model": "srragan" or "model": "srgan" "

?? I would assume it would depend on how the loss function of the SFTGAN architecture is defined, and because in the paper it is currently defined using SRGAN normal discriminator losses, using SRGAN would be a closer implementation.

The same would also go for network G options

"network_G": { "which_model_G": "sr_resnet" , "norm_type": null , "mode": "CNA" , "nf": 64 , "nb": 16 , "in_nc": 3 , "out_nc": 3 , "gc": 32 , "group": 1

set so it matches the SFTGAN architecture, is that correct?

Thanks!

xinntao commented 5 years ago

Yes, your understandings are right.