Any suggestions on the params to fine tune the ESRGAN_x4 model

XPixelGroup / BasicSR

Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.

https://basicsr.readthedocs.io/en/latest/

Apache License 2.0

6.78k stars 1.19k forks source link

Any suggestions on the params to fine tune the ESRGAN_x4 model #305

Closed SevenLJY closed 4 years ago

SevenLJY commented 4 years ago

Hi,

I am working on fine-tuning the ESRGAN network with my own dataset. I tried to use learning rate=1e-4, 1e-5, 1e-6 for the generator and discriminator (for about 20000 iterations), respectively. But all of the results from the LR test image have an unexpected blurring effect. Do you guys have any suggestions on the intuition of how to adjust the corresponding params? Like where is a good start to try the learning rate or any other useful params?

Here are my loss curve from two of my experiments. lr=1e-6: lr=1e-5

yera217 commented 4 years ago

I think the reason must be in the dataset and in the way you obtain LR-HR training pairs. 1) Are your HR images of high enough quality? 2) How do you obtain corresponding LR training images? 3) Are the test images from the same distribution as your LR training images? or you test the model on real LR images?

xinntao commented 4 years ago

@SevenLJY I think @yera217 have given good points. Did your training data distribution align well with your testing data? (In other words, did they follow the same downsampling process?)

SevenLJY commented 4 years ago

@yera217 @xinntao Thank you so much for your quick reply! For those questions from @yera217

My HR images are 4K images. (So I use extract_subimages.py script to crop my images into 480x480 size)
I obtained my LR images by downsampling with bicubic interpolation. (Also use extract_subimages.py script to crop my images into 120x120 size).
The test images are real images which are the complete images (similar to the image that my training data got cropped from)

All of my LR images are obtained from downsampling with bicubic interpolation. @xinntao

I found that the performance on my validation set is pretty good. My test results are not as good as the pre-trained model. So I am thinking that I might impose a negative effect on the generalization ability of the model. Does it make sense? If it is the case, do you have any good suggestions on how to maintain the generalization ability of the model when fine-tuning? Or any suggestions on how to avoid the blurring effect?

Many Thanks

yera217 commented 4 years ago

@SevenLJY Hi, So, are your test images are actually from the same batch as your HR training images? Can you upload a sample of your HR training image and test LR image here? Also, I would suggest to apply Gaussian blur of sigma=1.5 or 2.0 and kernel_size=5 on HR training images before down-sampling. It will make your model to better generalize for real image degradation.

SevenLJY commented 4 years ago

@yera217 Hi, Here is my sample HR training data. 20181217_WeikaiChen_00_diffuse_albedo_s049 The test LR image is just the whole face image which is not really a natural human face image but more like a texture map. Unfortunately, it is not very convenient for me to post it here.

You mean it better to blur the HR training data and then downsample? Could you please explain a little bit why blurring the HR image will help generalization?

yera217 commented 4 years ago

I read it in some SR papers, that Gaussian blur works better for real SR, and it actually works for me. Regarding your images: what is the resolution of test images? Your test images should be similar to LR training images for it to work.

SevenLJY commented 4 years ago

Thank you so much for your suggestions! I will try Gaussian blurring in my dataset.

My test data is 256 * 256. Actually, I intended to do two tasks. One is 256->1K, the other one is 1K -> 4K. Do I need to train two networks?

yera217 commented 4 years ago

So, if you want to do 256->1K SR, then you better train your model on high-res 1K images as HR and its corresponding down-sampled LR 256 images. The same goes for 1K -> 4K. Do you train on HR 1K images or 4K images?

xinntao commented 4 years ago

@SevenLJY This is a blind SR. The core lies that your downsampling process should be as close as the real-world images. You can try use Gaussian kernels with different sigma as @yera217 suggests.

Also, you can read papers of blind super-resolution, such as SRMD, IKC, etc.

SevenLJY commented 4 years ago

@yera217 You're right! I shouldn't test the model with 256-res images if I trained on 4K images. My fine-tuned model actually works well on 1K images. I really appreciated your help. It means a lot!

@xinntao Thank you so much for your explanation! I will look into it.