Mukosame / Zooming-Slow-Mo-CVPR-2020

Fast and Accurate One-Stage Space-Time Video Super-Resolution (accepted in CVPR 2020)
GNU General Public License v3.0
915 stars 164 forks source link

About the warning information of the DCNv2 module during training. #39

Closed CS-GangXu closed 4 years ago

CS-GangXu commented 4 years ago

Hi, thank you for creating such an innovative and wonderful space-time video super-resolution method.

When I follow the training settings in train_zsm.yml, the terminal sometimes shows the information: "WARNING: Offset mean is XXX, larger than 100". Then I looked for a solution and found that resuming from the nearest checkpoint might help. However, when I resumed from the nearest checkpoint, the warning information still happens again.

At the same time, I found that although the warning message continued to be displayed, the performance(PSNR) continued to increase.

So I would like to know whether this happened during your training. If it happened, how did you deal with it to achieve the performance in the paper?

Here is my training config file:

name: zsm_official
use_tb_logger: false #true
model: VideoSR_base
distortion: sr
scale: 4
gpu_ids: [2, 3]

datasets:
  train:
    name: Vimeo7
    mode: Vimeo7
    interval_list: [1]
    random_reverse: true #false
    border_mode: false
    dataroot_GT: /home/lz/xg/vimeo7_train_GT.lmdb
    dataroot_LQ: /home/lz/xg/vimeo7_train_LR7.lmdb
    cache_keys: Vimeo7_train_keys.pkl 

    N_frames: 7
    use_shuffle: true
    n_workers: 12 # per GPU
    batch_size: 24
    GT_size: 128 
    LQ_size: 32
    use_flip: true
    use_rot: true
    color: RGB

network_G:
  which_model_G: LunaTokis
  nf: 64
  nframes: 7
  groups: 8
  front_RBs: 5
  mid_RBs: 0
  back_RBs: 40
  HR_in: false

path:
  pretrain_model_G: ~
  strict_load: true #true #
  resume_state:  ~

train:
  lr_G: !!float 4e-4
  lr_scheme: CosineAnnealingLR_Restart
  beta1: 0.9
  beta2: 0.99
  niter: 600000
  warmup_iter: -1 #4000  # -1: no warm up
  T_period: [150000, 150000, 150000, 150000]
  restarts: [150000, 300000, 450000]
  restart_weights: [1, 1, 1]
  eta_min: !!float 1e-7

  pixel_criterion: cb
  pixel_weight: 1.0
  val_freq: !!float 5e3

  manual_seed: 0

logger:
  print_freq: 1
  save_checkpoint_freq: !!float 5e3
Mukosame commented 4 years ago

Hi @bhjihck , please refer to this issue: https://github.com/Mukosame/Zooming-Slow-Mo-CVPR-2020/issues/30 You can also try to resume from earlier checkpoints. It surprises me that the psnr keeps increasing. Just a guess: it happens at very early iterations?

CS-GangXu commented 4 years ago

Thanks for your reply! It does happen at very early iterations. Once it passes the early stage, the probability of the occurrence of this warning information will be much smaller.