Training of LR stage - Githubissues

Mayongrui commented 1 year ago

Hi there, I cannot train the network and converge to the numeric metric values reported in the manuscript in the SR stage. All settings and experiments are performed for x4 SR.

Setting A. As you commented in #11, I changed the corresponding codes and retrained the network for the pretraining stage. The network converged as expected, and the validation PSNR was around 24.5 dB on DIV2K validation set, which seemed reasonable. Then, I further trained the network for the pretraining stage, however, cannot reproduce the results reported in the paper. The best PSNR/SSIM/LPIPS loss was 21.85/0.5813/0.3724 at 350K iterations, respectively.

Setting B. To locate the problem, I trained the network for SR stage with the default options file and HRP pretrained weights of this repo. However, also converged to a very similar number with Setting A.

Would you mind giving me any suggestions or guidance about this issue?

Some information may help:

The code to generate synthetic testing images:https://drive.google.com/file/d/1k4063h7KHKf5x5firP9FFzG0nIzGTv6s/view?usp=sharing
The generated testing images: https://drive.google.com/drive/folders/1UDodF_0BcnU3KeCd7UqTQP0fHiufbrff?usp=sharing

Setting A:

pretrain stage files and results: https://drive.google.com/drive/folders/15ser2Fvk0DFx-V0mv33Dj9hur92TJxBD?usp=sharing
SR stage files and results: https://drive.google.com/drive/folders/1_wvSpzwgG4cT3uczlsFJDGPOB0AOg2En?usp=sharing

Setting B:

pretrain weights: https://drive.google.com/drive/folders/1g3NsDoEnUvKIzw-fZPHtu5o4G7Znmx8A?usp=sharing
SR stage files and results: https://drive.google.com/drive/folders/1aayOT3xDUicuCM5eGvChryrfO_1AGtbg?usp=sharing

Fullfile: https://drive.google.com/drive/folders/1MLPoIYXvWODhevk8ICSAmPHj0DP0PF-k?usp=sharing

chaofengc commented 1 year ago

Hi, I retrained the model with Setting B in the past two days and it works fine. With our generated test images, it reach the best PSNR/SSIM/LPIPS - 22.43/0.5909/0.3437 at 160k iter, and also achieves LPIPS score with 0.3557 in your tested images. It is supposed to reach similar performance as the paper with longer training. The training log is updated to wandb for your reference. There might be several reasons for your problem:

Due to different random process, your generated test images are different, and seems to be more difficult than my testing images. I have tested your generated images with the provided best model, and it reads 0.342 LPIPS.
Please make sure that the pyiqa package you installed >=0.1.4. Although lower version does not give errors, it actually does not support perceptual loss backward. Note: I made this mistake in 008_FeMaSR_HQ_stage, and the initialization fix in #11 might be not necessary (would be quite nice if you could help to verify this)
Please make sure that your pretrained HRP gives similar (or better) reconstruction performance as the provided one.

Mayongrui commented 1 year ago

Thanks for the response.

I rechecked my environment, and the pyiqa version was v0.1.4;
To minimize the performance gap, would you mind releasing the degraded validation images and the code for generating the training set?
In my experiments, the correct initialization is crucial during HRP training. Without that, HRP cannot work as expected, producing color shift (over-yellow or over-red) with low PSNR/SSIM/LPIPS. This issue did not happen after the initialization bug was fixed.

chaofengc commented 1 year ago

The training images are generated with degradation_bsrgan and testing images are generated with degradation_bsrgan_plus, using the provided script generate_dataset.py. We did not make any changes to these codes. Please note that my retrained model also works fine on your test images, therefore data generation is unlikely to be the problem. If your model did not work well on your own test images (achieve similar performance as the released model, i.e., 0.342 LPIPS), it is unlikely to work on our test images.

Another difference is that, we generate the training dataset offline to speed up the training. Since the degradation space of bsrgan is quite large, generating the images online and training the model with a small batch size may cause problem.

You may try to first synthesize the LR images offline, which would make the model training easier. https://github.com/chaofengc/FeMaSR/blob/497d3ee99fa93040d5236ff6e3f535a652ebb4d6/options/train_FeMaSR_LQ_stage.yml#L13-L17

Mayongrui commented 1 year ago

Does the offline preprocessing for training set generation include some other enhancement? like 0.5~1.0 scaling before passing to the degradation model described in the manuscript?

chaofengc commented 1 year ago

No, resize at the beginning will further enlarge the degradation space. This might also be the problem in current online mode, you can try to set use_resize_crop to false when using the BSRGANTrainDataset : https://github.com/chaofengc/FeMaSR/blob/497d3ee99fa93040d5236ff6e3f535a652ebb4d6/options/train_FeMaSR_LQ_stage.yml#L22

In fact, we did not verify whether such random scaling brings improvement or deterioration to the performance in offline mode either. We released the same setting as the paper to reproduce our results. Since random scaling is already performed in degradation_bsrgan, further scaling may not be necessary. You may try to verify its influence if you have enough GPUs.

In a word, keep a proper degradation space can ease the difficulty of model training. Otherwise, you may need much more computation resources similar to the training of BSRGAN.

chaofengc / FeMaSR

Training of LR stage #15