Train ESRGAN from scratch

a462428 commented 5 years ago

Hi, I train ESRGAN from scratch by using DIVA800. However it is really hard to get good trained model.

Is it convenient to share how many epoch you set and learning rate about G and D and D_update_rate you train your pertained model? And if any other detail I should notice? Thx!

xinntao commented 5 years ago

A pretrain PSNR-oriented model is needed.
For the pretrained model, you can use the train_SRResNet config by modifying the network_G part while keeping the left unchanged.

a462428 commented 5 years ago

Hi, thx!
Cuz I change discriminator from vgg128 to vgg128_SN, so I have to train from zero. But it’s hard to train the model.

XiaotianM commented 5 years ago

1. A pretrain PSNR-oriented model is needed.

2. For the pretrained model, you can use the `train_SRResNet` config by modifying the `network_G` part while keeping the left unchanged.

Hi, xintao, I also find it is hard to train ESRGAN . I have tried to train a PSNR-oriented model, but I find the loss shock.

The log is: 19-08-11 17:38:53.279 - INFO: Model [SRModel] is created. 19-08-11 17:38:53.279 - INFO: Start training from epoch: 0, iter: 0 19-08-11 17:44:05.314 - INFO: <epoch: 1, iter: 100, lr:2.000e-04> l_pix: 6.0063e-02 19-08-11 17:49:00.121 - INFO: <epoch: 3, iter: 200, lr:2.000e-04> l_pix: 5.9322e-02 19-08-11 17:54:00.926 - INFO: <epoch: 5, iter: 300, lr:2.000e-04> l_pix: 4.8096e-02 19-08-11 17:58:55.067 - INFO: <epoch: 7, iter: 400, lr:2.000e-04> l_pix: 4.7359e-02 19-08-11 18:03:34.735 - INFO: <epoch: 9, iter: 500, lr:2.000e-04> l_pix: 5.2223e-02 19-08-11 18:08:33.610 - INFO: <epoch: 11, iter: 600, lr:2.000e-04> l_pix: 5.1359e-02 19-08-11 18:13:39.305 - INFO: <epoch: 13, iter: 700, lr:2.000e-04> l_pix: 5.6984e-02 19-08-11 18:18:35.381 - INFO: <epoch: 15, iter: 800, lr:2.000e-04> l_pix: 3.9553e-02 19-08-11 18:23:31.628 - INFO: <epoch: 17, iter: 900, lr:2.000e-04> l_pix: 4.5792e-02 19-08-11 18:27:51.033 - INFO: <epoch: 19, iter: 1,000, lr:2.000e-04> l_pix: 5.3166e-02 19-08-11 18:27:51.730 - INFO: # Validation # PSNR: 2.3120e+01 19-08-11 18:27:51.730 - INFO: <epoch: 19, iter: 1,000> psnr: 2.3120e+01 19-08-11 18:27:51.731 - INFO: Saving models and training states. 19-08-11 18:32:05.085 - INFO: <epoch: 21, iter: 1,100, lr:2.000e-04> l_pix: 4.3067e-02 19-08-11 18:36:07.572 - INFO: <epoch: 23, iter: 1,200, lr:2.000e-04> l_pix: 4.8505e-02 19-08-11 18:40:11.608 - INFO: <epoch: 25, iter: 1,300, lr:2.000e-04> l_pix: 4.6968e-02 19-08-11 18:44:24.015 - INFO: <epoch: 27, iter: 1,400, lr:2.000e-04> l_pix: 3.5826e-02 19-08-11 18:48:46.472 - INFO: <epoch: 29, iter: 1,500, lr:2.000e-04> l_pix: 5.6939e-02 19-08-11 18:52:59.439 - INFO: <epoch: 31, iter: 1,600, lr:2.000e-04> l_pix: 4.9018e-02 19-08-11 18:57:19.229 - INFO: <epoch: 33, iter: 1,700, lr:2.000e-04> l_pix: 5.3659e-02 19-08-11 19:01:56.948 - INFO: <epoch: 35, iter: 1,800, lr:2.000e-04> l_pix: 3.7018e-02 19-08-11 19:06:40.123 - INFO: <epoch: 37, iter: 1,900, lr:2.000e-04> l_pix: 4.7776e-02

The config:

#### general settings
name: 003_RRDB_ESRGANx4_DIV2K
use_tb_logger: true
model: sr
distortion: sr
scale: 4
gpu_ids: [0]

#### datasets
datasets:
  train:
    name: DIV2K
    mode: LQGT
    dataroot_GT: /data/DataSet/DIV2K_lmdb/DIV2K.lmdb
    dataroot_LQ: /data/DataSet/DIV2K_lmdb/DIV2K_h265_x4.lmdb

    use_shuffle: true
    n_workers: 6  # per GPU
    batch_size: 16
    GT_size: 128
    use_flip: true
    use_rot: true
    color: RGB
  val:
    name: val_set5
    mode: LQGT
    dataroot_GT: /data/DataSet/SR_testing_datasets/Set5_alignment
    dataroot_LQ: /data/DataSet/SR_testing_datasets/Set5x4_h265

#### network structures
network_G:
  which_model_G: RRDBNet
  in_nc: 3
  out_nc: 3
  nf: 64
  nb: 23

#### path
path:
  pretrain_model_G: ~ #../experiments/pretrained_models/RRDB_PSNR_x4.pth
  strict_load: true
  resume_state: ~

#### training settings: learning rate scheme, loss
train:
  lr_G: !!float 2e-4
  lr_scheme: CosineAnnealingLR_Restart
  beta1: 0.9
  beta2: 0.99
  niter: 1000000
  warmup_iter: -1  # no warm up
  T_period: [250000, 250000, 250000, 250000]
  restarts: [250000, 500000, 750000]
  restart_weights: [1, 1, 1]
  eta_min: !!float 1e-7

  pixel_criterion: l1
  pixel_weight: 1.0

  manual_seed: 10
  val_freq: !!float 1e3

#### logger
logger:
  print_freq: 100
  save_checkpoint_freq: !!float 1e3

xinntao commented 5 years ago

You may need to train it for a long time. I train it for about a week.

e-271 commented 4 years ago

Hi @XiaotianM, did you have success with those configuration settings? I would also like to train the PSNR model from scratch.

XPixelGroup / BasicSR

Train ESRGAN from scratch #192