cszn / KAIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR
https://cszn.github.io/
MIT License
2.9k stars 624 forks source link

Why the train is so slow? #57

Closed zapplelove closed 2 years ago

zapplelove commented 3 years ago

My train log is:

21-04-08 09:20:55.248 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-08 09:20:55.248 : loading PCA projection matrix...
21-04-08 09:20:55.248 : Random seed: 8094
21-04-08 09:20:55.380 : Number of train images: 3,550, iters: 56
21-04-08 09:20:57.633 : 
Networks name: SRMD
Params number: 1553200
Net structure:
SRMD(
  (model): Sequential(
    (0): Conv2d(19, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU(inplace=True)
    (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (19): ReLU(inplace=True)
    (20): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (21): ReLU(inplace=True)
    (22): Conv2d(128, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (23): PixelShuffle(upscale_factor=4)
  )
)

21-04-08 09:20:57.636 : 
 |  mean  |  min   |  max   |  std   || shape               
 | -0.000 | -0.058 |  0.064 |  0.015 | torch.Size([128, 19, 3, 3]) || model.0.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.0.bias
 |  0.000 | -0.024 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.2.bias
 | -0.000 | -0.025 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.4.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.4.bias
 | -0.000 | -0.027 |  0.024 |  0.006 | torch.Size([128, 128, 3, 3]) || model.6.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.6.bias
 | -0.000 | -0.029 |  0.024 |  0.006 | torch.Size([128, 128, 3, 3]) || model.8.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.8.bias
 |  0.000 | -0.025 |  0.024 |  0.006 | torch.Size([128, 128, 3, 3]) || model.10.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.10.bias
 |  0.000 | -0.029 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.12.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.12.bias
 | -0.000 | -0.025 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.14.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.14.bias
 | -0.000 | -0.026 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.16.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.16.bias
 | -0.000 | -0.025 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.18.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.18.bias
 |  0.000 | -0.027 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.20.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.20.bias
 | -0.000 | -0.023 |  0.024 |  0.006 | torch.Size([48, 128, 3, 3]) || model.22.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([48]) || model.22.bias

21-04-08 10:12:27.138 : <epoch:  3, iter:     200, lr:1.000e-04> G_loss: 1.228e-01 
21-04-08 11:03:26.168 : <epoch:  7, iter:     400, lr:1.000e-04> G_loss: 1.004e-01 
21-04-08 11:55:42.532 : <epoch: 10, iter:     600, lr:1.000e-04> G_loss: 8.437e-02 
21-04-08 12:46:08.205 : <epoch: 14, iter:     800, lr:1.000e-04> G_loss: 7.818e-02 
21-04-08 13:38:28.291 : <epoch: 18, iter:   1,000, lr:1.000e-04> G_loss: 5.932e-02 
21-04-08 14:28:32.066 : <epoch: 21, iter:   1,200, lr:1.000e-04> G_loss: 6.853e-02 
21-04-08 15:20:37.527 : <epoch: 25, iter:   1,400, lr:1.000e-04> G_loss: 5.390e-02 
21-04-08 16:11:03.519 : <epoch: 29, iter:   1,600, lr:1.000e-04> G_loss: 5.861e-02 
21-04-08 17:01:57.788 : <epoch: 32, iter:   1,800, lr:1.000e-04> G_loss: 5.812e-02 
21-04-08 17:54:55.700 : <epoch: 36, iter:   2,000, lr:1.000e-04> G_loss: 4.487e-02 
21-04-08 18:45:43.054 : <epoch: 39, iter:   2,200, lr:1.000e-04> G_loss: 5.985e-02 
21-04-08 19:38:15.152 : <epoch: 43, iter:   2,400, lr:1.000e-04> G_loss: 6.035e-02 
21-04-08 20:28:46.777 : <epoch: 47, iter:   2,600, lr:1.000e-04> G_loss: 5.407e-02 
21-04-08 21:21:19.155 : <epoch: 50, iter:   2,800, lr:1.000e-04> G_loss: 5.800e-02 
21-04-08 22:12:26.084 : <epoch: 54, iter:   3,000, lr:1.000e-04> G_loss: 4.669e-02 
21-04-08 23:04:49.046 : <epoch: 58, iter:   3,200, lr:1.000e-04> G_loss: 5.707e-02 
21-04-08 23:55:42.746 : <epoch: 61, iter:   3,400, lr:1.000e-04> G_loss: 5.521e-02 
21-04-09 00:48:11.666 : <epoch: 65, iter:   3,600, lr:1.000e-04> G_loss: 5.583e-02 
21-04-09 01:39:08.950 : <epoch: 69, iter:   3,800, lr:1.000e-04> G_loss: 4.659e-02 
21-04-09 02:30:07.278 : <epoch: 72, iter:   4,000, lr:1.000e-04> G_loss: 6.075e-02 
21-04-09 03:22:56.870 : <epoch: 76, iter:   4,200, lr:1.000e-04> G_loss: 5.796e-02 
21-04-09 04:13:49.914 : <epoch: 79, iter:   4,400, lr:1.000e-04> G_loss: 4.472e-02 
21-04-09 05:06:26.278 : <epoch: 83, iter:   4,600, lr:1.000e-04> G_loss: 4.891e-02 
21-04-09 05:56:58.472 : <epoch: 87, iter:   4,800, lr:1.000e-04> G_loss: 5.581e-02 
21-04-09 06:49:25.905 : <epoch: 90, iter:   5,000, lr:1.000e-04> G_loss: 6.413e-02 
21-04-09 06:49:25.905 : Saving the model.
21-04-09 06:49:26.138 : ---1-->   baby.bmp | 26.81dB
21-04-09 06:49:26.158 : ---2-->   bird.bmp | 22.54dB
21-04-09 06:49:26.170 : ---3--> butterfly.bmp | 18.75dB
21-04-09 06:49:26.218 : ---4-->   head.bmp | 26.36dB
21-04-09 06:49:26.253 : ---5-->  woman.bmp | 22.48dB
21-04-09 06:49:26.303 : <epoch: 90, iter:   5,000, Average PSNR : 23.39dB

21-04-09 07:40:04.505 : <epoch: 94, iter:   5,200, lr:1.000e-04> G_loss: 5.548e-02 
21-04-09 08:32:23.308 : <epoch: 98, iter:   5,400, lr:1.000e-04> G_loss: 5.314e-02 
21-04-09 09:22:56.333 : <epoch:101, iter:   5,600, lr:1.000e-04> G_loss: 5.548e-02 

非常抱歉,说英文比较费事,下面我会用中文描述问题。 我做了如下计算:开始训练是09:20.57,训练出来一个结果是第二天早上06:49.26,总共训练22小时27分,epoch次数是90.平均训练一次花费时间15分钟。程序设定训练1000000次,花费总时间为10,393.5天,即28.5年。 这样的训练速度也太慢了吧。有什么提升速度的方法吗? 而且GPU的占用率一直非常低,不知道是什么原因,请问有没有解决方案。 image

zapplelove commented 3 years ago

I just copy some image files to the folder. and the training is running, but it stopped just after a few seconds. the training log is:

21-04-02 13:02:02.176 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:02:02.176 : calculating PCA projection matrix...
21-04-02 13:04:02.431 : done!
21-04-02 13:04:02.431 : Random seed: 3448
21-04-02 13:05:21.938 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:05:21.938 : loading PCA projection matrix...
21-04-02 13:05:21.938 : Random seed: 2573
21-04-02 13:16:42.167 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:16:42.168 : loading PCA projection matrix...
21-04-02 13:16:42.168 : Random seed: 9255
21-04-02 13:43:07.746 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:43:07.746 : loading PCA projection matrix...
21-04-02 13:43:07.746 : Random seed: 2330
21-04-02 13:46:37.295 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:46:37.296 : loading PCA projection matrix...
21-04-02 13:46:37.296 : Random seed: 2052
21-04-02 13:47:16.681 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:47:16.682 : loading PCA projection matrix...
21-04-02 13:47:16.682 : Random seed: 1695
21-04-02 13:59:06.591 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 13:59:06.591 : loading PCA projection matrix...
21-04-02 13:59:06.591 : Random seed: 7798
21-04-02 14:09:39.545 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 14:09:39.546 : loading PCA projection matrix...
21-04-02 14:09:39.546 : Random seed: 896
21-04-02 14:11:17.175 :   task: srmd
  model: plain
  gpu_ids: [0]
  scale: 4
  n_channels: 3
  sigma: [0, 50]
  sigma_test: 0
  merge_bn: False
  merge_bn_startpoint: 400000
  path:[
    root: superresolution
    pretrained_netG: None
    task: superresolution/srmd
    log: superresolution/srmd
    options: superresolution/srmd/options
    models: superresolution/srmd/models
    images: superresolution/srmd/images
  ]
  datasets:[
    train:[
      name: train_dataset
      dataset_type: srmd
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 96
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 64
      phase: train
      scale: 4
      n_channels: 3
    ]
    test:[
      name: test_dataset
      dataset_type: srmd
      dataroot_H: testsets/set5
      dataroot_L: None
      phase: test
      scale: 4
      n_channels: 3
    ]
  ]
  netG:[
    net_type: srmd
    in_nc: 19
    out_nc: 3
    nc: 128
    nb: 12
    gc: 32
    ng: 2
    reduction: 16
    act_mode: R
    upsample_mode: pixelshuffle
    downsample_mode: strideconv
    init_type: orthogonal
    init_bn_type: uniform
    init_gain: 0.2
    scale: 4
  ]
  train:[
    G_lossfn_type: l1
    G_lossfn_weight: 1.0
    G_optimizer_type: adam
    G_optimizer_lr: 0.0001
    G_optimizer_clipgrad: None
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [200000, 400000, 600000, 800000, 1000000, 2000000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    checkpoint_test: 5000
    checkpoint_save: 5000
    checkpoint_print: 200
  ]
  opt_path: options/train_srmd.json
  is_train: True

21-04-02 14:11:17.175 : loading PCA projection matrix...
21-04-02 14:11:17.175 : Random seed: 3606
21-04-02 14:11:17.294 : Number of train images: 70, iters: 2
21-04-02 14:11:20.422 : 
Networks name: SRMD
Params number: 1553200
Net structure:
SRMD(
  (model): Sequential(
    (0): Conv2d(19, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): ReLU(inplace=True)
    (2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): ReLU(inplace=True)
    (4): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): ReLU(inplace=True)
    (6): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (13): ReLU(inplace=True)
    (14): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): ReLU(inplace=True)
    (16): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (17): ReLU(inplace=True)
    (18): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (19): ReLU(inplace=True)
    (20): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (21): ReLU(inplace=True)
    (22): Conv2d(128, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (23): PixelShuffle(upscale_factor=4)
  )
)

21-04-02 14:11:20.429 : 
 |  mean  |  min   |  max   |  std   || shape               
 |  0.000 | -0.057 |  0.059 |  0.015 | torch.Size([128, 19, 3, 3]) || model.0.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.0.bias
 |  0.000 | -0.027 |  0.026 |  0.006 | torch.Size([128, 128, 3, 3]) || model.2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.2.bias
 |  0.000 | -0.025 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.4.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.4.bias
 |  0.000 | -0.025 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.6.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.6.bias
 | -0.000 | -0.028 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.8.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.8.bias
 |  0.000 | -0.026 |  0.026 |  0.006 | torch.Size([128, 128, 3, 3]) || model.10.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.10.bias
 |  0.000 | -0.025 |  0.028 |  0.006 | torch.Size([128, 128, 3, 3]) || model.12.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.12.bias
 | -0.000 | -0.027 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.14.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.14.bias
 | -0.000 | -0.024 |  0.028 |  0.006 | torch.Size([128, 128, 3, 3]) || model.16.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.16.bias
 |  0.000 | -0.025 |  0.025 |  0.006 | torch.Size([128, 128, 3, 3]) || model.18.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.18.bias
 |  0.000 | -0.025 |  0.027 |  0.006 | torch.Size([128, 128, 3, 3]) || model.20.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([128]) || model.20.bias
 |  0.000 | -0.022 |  0.026 |  0.006 | torch.Size([48, 128, 3, 3]) || model.22.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | torch.Size([48]) || model.22.bias

/home/zpl/local/venv/py3_torch/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/zpl/local/venv/py3_torch/lib/python3.6/site-packages/torch/optim/lr_scheduler.py:156: UserWarning: The epoch parameter in `scheduler.step()` was not necessary and is being deprecated where possible. Please use `scheduler.step()` to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
  warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)

Is it because the computer is slow????

UESTCrookieLI commented 3 years ago

请问你解决这个问题了吗?我也是同样的情况,没改任何训练参数。 我发现时间主要是在每个epoch开始的时候的data loader,CPU占用会很高,然后过很久才能load完,不知道该怎么解决。

sherlybe commented 3 years ago

请问你解决这个问题了吗?我也是同样的情况,没改任何训练参数。 我发现时间主要是在每个epoch开始的时候的data loader,CPU占用会很高,然后过很久才能load完,不知道该怎么解决。

同问,我也是!!!

hbw945 commented 2 years ago

请把numworks调为0

lucky-zwx commented 2 years ago

numworks调为0对GPU使用率低并没有用!

JingyunLiang commented 2 years ago

Please refer to https://github.com/JingyunLiang/SwinIR/issues/57#issuecomment-1109407381 and cut the large pictures in the training set into small pictures