XPixelGroup / BasicSR

Open Source Image and Video Restoration Toolbox for Super-resolution, Denoise, Deblurring, etc. Currently, it includes EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, BasicVSR, SwinIR, ECBSR, etc. Also support StyleGAN2, DFDNet.
https://basicsr.readthedocs.io/en/latest/
Apache License 2.0
6.64k stars 1.17k forks source link

'NoneType' object has no attribute 'astype' #254

Closed vova0108 closed 4 years ago

vova0108 commented 4 years ago

After 2700 iterations, the process stops with an error.

2020-07-22 18:17:01,171 INFO: [wowa..][epoch:  1, iter:   2,700, lr:(1.000e-04,)] [eta: 13 days, 2:54:12, time (data): 2.895 (0.001)] l_g_pix: 5.4666e-04 l_g_percep: 1.1653e+00 l_g_gan: 1.7249e-02 l_d_real: 5.0157e-02 l_d_fake: 8.3456e-02 out_d_real: 4.3754e+01 out_d_fake: 4.0371e+01
libpng error: bad adaptive filter value
2020-07-22 18:21:51,493 INFO: [wowa..][epoch:  1, iter:   2,800, lr:(1.000e-04,)] [eta: 13 days, 3:01:13, time (data): 3.002 (0.001)] l_g_pix: 6.0713e-04 l_g_percep: 1.4170e+00 l_g_gan: 3.2586e-02 l_d_real: 2.2469e-03 l_d_fake: 3.0331e-03 out_d_real: 4.3671e+01 out_d_fake: 3.7156e+01
Traceback (most recent call last):
  File "basicsr/train.py", line 195, in <module>
    main()
  File "basicsr/train.py", line 145, in main
    for _, train_data in enumerate(train_loader):
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 345, in __next__
    data = self._next_data()
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
    return self._process_data(data)
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
    data.reraise()
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 4.
Original Traceback (most recent call last):
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/vova/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/tmp/BasicSR/basicsr/data/paired_image_dataset.py", line 81, in __getitem__
    img_gt = mmcv.imfrombytes(img_bytes).astype(np.float32) / 255.
AttributeError: 'NoneType' object has no attribute 'astype'
xinntao commented 4 years ago

It seems the dataset has problems.

You may check your dataset.

Could you provide more details of your config yml file?

wolfam0108 commented 4 years ago

Yes of course. No problem.

# general settings
name: wowa
model_type: ESRGANModel
scale: 4
num_gpu: 1

# dataset and data loader settings
datasets:
  train:
    name: DIV2K
    type: PairedImageDataset
    dataroot_gt: ./datasets/DIV2K_train_HR_sub
    dataroot_lq: ./datasets/DIV2K_train_LR_bicubicX4_sub
    io_backend:
      type: disk
      #server_list_cfg: /mnt/lustre/share/memcached_client/server_list.conf
      #client_cfg: /mnt/lustre/share/memcached_client/client.conf
      #sys_path: /mnt/lustre/share/pymc/py3

    gt_size: 128
    use_flip: true
    use_rot: true

    # data loader
    use_shuffle: true
    num_worker: 6  # per GPU
    batch_size: 16
    dataset_enlarge_ratio: 1000

  val:
    name: val_set14
    type: PairedImageDataset
    dataroot_gt: ./datasets/val_set14/Set14
    dataroot_lq: ./datasets/val_set14/Set14_bicLRx4
    io_backend:
      type: disk

# network structures
network_g:
  type: RRDBNet
  num_in_ch: 3
  num_out_ch: 3
  num_feat: 64
  num_block: 23

network_d:
  type: VGGStyleDiscriminator128
  num_in_ch: 3
  num_feat: 64

# path
path:
  pretrain_model_g: ./experiments/pretrained_models/ESRGAN_PSNR_SRx4_DF2K_official-150ff491.pth
  strict_load: true
  resume_state: ~

# training settings
train:
  optim_g:
    type: Adam
    lr: !!float 1e-4
    weight_decay: 0
    betas: [0.9, 0.99]
  optim_d:
    type: Adam
    lr: !!float 1e-4
    weight_decay: 0
    betas: [0.9, 0.99]

  scheduler:
    type: MultiStepLR
    milestones: [50000, 100000, 200000, 300000]
    gamma: 0.5

  niter: 400000
  warmup_iter: -1  # no warm up

  # losses
  pixel_opt:
    type: L1Loss
    loss_weight: !!float 1e-2
    reduction: mean
  perceptual_opt:
    type: PerceptualLoss
    layer_weights:
      'conv5_4': 1  # before relu
    vgg_type: vgg19
    use_input_norm: true
    perceptual_weight: 1.0
    style_weight: 0
    norm_img: false
    criterion: l1
  gan_opt:
    type: GANLoss
    gan_type: vanilla
    real_label_val: 1.0
    fake_label_val: 0.0
    loss_weight: !!float 5e-3

  net_d_iters: 1
  net_d_init_iters: 0

  manual_seed: 0

# validation settings
val:
  val_freq: !!float 5e3
  save_img: true

# logging settings
logger:
  print_freq: 100
  save_checkpoint_freq: !!float 1e3
  use_tb_logger: true

# dist training settings
dist_params:
  backend: nccl
  port: 29748
xinntao commented 4 years ago

It seems OK from the config file.

You may check the dataset. A simple way is to write a simple script to real all data to see whether there are any corrupted images~

wolfam0108 commented 4 years ago

Got it. I will try to recreate the dataset. I take DIV2K as a basis and independently reduce the resolution by 4 times.

wolfam0108 commented 4 years ago

Rebuilt the dataset, training now goes without errors. Thanks for the help!

UESTCrookieLI commented 1 year ago

I met the same error. But when i rebuild the datasets, the error still occur. I always resume the training when the eroor occur.... can you give me some other advice to solve this error?