ajbrock / BigGAN-PyTorch

The author's officially unofficial PyTorch BigGAN implementation.
MIT License
2.84k stars 470 forks source link

ValueError: __len__() should return >= 0 #71

Open JanineCHEN opened 3 years ago

JanineCHEN commented 3 years ago

Great thanks for the open-sourced models. I have encountered the following issue when trying to resume the training process. I trained the model using my own dataset, the previous training and checkpoints saving were successful without any prompted error:

Traceback (most recent call last):
  File "train.py", line 227, in <module>
    main()
  File "train.py", line 224, in main
    run(config)
  File "train.py", line 171, in run
    for i, (x, y) in enumerate(pbar):
  File "/home/projects/11002043/BIGGAN_archdaily_outdoor_128_bs110x237/utils.py", line 834, in progress
    total = total or len(items)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 316, in __len__
    return len(self._index_sampler)
  File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/sampler.py", line 212, in __len__
    return (len(self.sampler) + self.batch_size - 1) // self.batch_size
ValueError: __len__() should return >= 0

The configurations are as following:

$ sh scripts/launch_BigGAN_bs110x237.sh
{'dataset': 'I128_hdf5', 'augment': False, 'num_workers': 4, 'pin_memory': True, 'shuffle': True, 'load_in_mem': True, 'use_multiepoch_sampler': True, 'model': 'BigGAN', 'G_param': 'SN', 'D_param': 'SN', 'G_ch': 32, 'D_ch': 32, 'G_depth': 1, 'D_depth': 1, 'D_wide': True, 'G_shared': True, 'shared_dim': 128, 'dim_z': 120, 'z_var': 1.0, 'hier': True, 'cross_replica': False, 'mybn': False, 'G_nl': 'inplace_relu', 'D_nl': 'inplace_relu', 'G_attn': '32', 'D_attn': '32', 'norm_style': 'bn', 'seed': 0, 'G_init': 'ortho', 'D_init': 'ortho', 'skip_init': False, 'G_lr': 0.0001, 'D_lr': 0.0004, 'G_B1': 0.0, 'D_B1': 0.0, 'G_B2': 0.999, 'D_B2': 0.999, 'batch_size': 110, 'G_batch_size': 0, 'num_G_accumulations': 237, 'num_D_steps': 1, 'num_D_accumulations': 237, 'split_D': False, 'num_epochs': 100, 'parallel': True, 'G_fp16': False, 'D_fp16': False, 'D_mixed_precision': False, 'G_mixed_precision': False, 'accumulate_stats': False, 'num_standing_accumulations': 16, 'G_eval_mode': True, 'save_every': 100, 'num_save_copies': 2, 'num_best_copies': 5, 'which_best': 'FID', 'no_fid': False, 'test_every': 100, 'num_inception_images': 50000, 'hashname': False, 'base_root': '', 'data_root': 'data', 'weights_root': 'weights', 'logs_root': 'logs', 'samples_root': 'samples', 'pbar': 'mine', 'name_suffix': '', 'experiment_name': 'BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema', 'config_from_name': False, 'ema': True, 'ema_decay': 0.9999, 'use_ema': True, 'ema_start': 300, 'adam_eps': 1e-06, 'BN_eps': 1e-05, 'SN_eps': 1e-06, 'num_G_SVs': 1, 'num_D_SVs': 1, 'num_G_SV_itrs': 1, 'num_D_SV_itrs': 1, 'G_ortho': 0.0, 'D_ortho': 0.0, 'toggle_grads': True, 'which_train_fn': 'GAN', 'load_weights': '', 'resume': True, 'logstyle': '%3.3e', 'log_G_spectra': False, 'log_D_spectra': False, 'sv_log_interval': 10}
Skipping initialization for training resumption...
Experiment name is BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema
Adding attention layer in G at resolution 32
Adding attention layer in D at resolution 32
Preparing EMA for G with decay of 0.9999
Adding attention layer in G at resolution 32
Initializing EMA parameters to be source parameters...
Generator(
  (activation): ReLU(inplace=True)
  (shared): Embedding(160, 128)
  (linear): SNLinear(in_features=20, out_features=8192, bias=True)
  (blocks): ModuleList(
    (0): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 512, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=512, bias=False)
          (bias): SNLinear(in_features=148, out_features=512, bias=False)
        )
        (bn2): ccbn(
          out: 512, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=512, bias=False)
          (bias): SNLinear(in_features=148, out_features=512, bias=False)
        )
      )
    )
    (1): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 512, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=512, bias=False)
          (bias): SNLinear(in_features=148, out_features=512, bias=False)
        )
        (bn2): ccbn(
          out: 256, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=256, bias=False)
          (bias): SNLinear(in_features=148, out_features=256, bias=False)
        )
      )
    )
    (2): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(256, 128, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 256, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=256, bias=False)
          (bias): SNLinear(in_features=148, out_features=256, bias=False)
        )
        (bn2): ccbn(
          out: 128, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=128, bias=False)
          (bias): SNLinear(in_features=148, out_features=128, bias=False)
        )
      )
      (1): Attention(
        (theta): SNConv2d(128, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (phi): SNConv2d(128, 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (g): SNConv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (o): SNConv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
    )
    (3): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(128, 64, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 128, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=128, bias=False)
          (bias): SNLinear(in_features=148, out_features=128, bias=False)
        )
        (bn2): ccbn(
          out: 64, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=64, bias=False)
          (bias): SNLinear(in_features=148, out_features=64, bias=False)
        )
      )
    )
    (4): ModuleList(
      (0): GBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(64, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(64, 32, kernel_size=(1, 1), stride=(1, 1))
        (bn1): ccbn(
          out: 64, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=64, bias=False)
          (bias): SNLinear(in_features=148, out_features=64, bias=False)
        )
        (bn2): ccbn(
          out: 32, in: 148, cross_replica=False
          (gain): SNLinear(in_features=148, out_features=32, bias=False)
          (bias): SNLinear(in_features=148, out_features=32, bias=False)
        )
      )
    )
  )
  (output_layer): Sequential(
    (0): bn()
    (1): ReLU(inplace=True)
    (2): SNConv2d(32, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  )
)
Discriminator(
  (activation): ReLU(inplace=True)
  (blocks): ModuleList(
    (0): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(3, 32, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (1): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(32, 64, kernel_size=(1, 1), stride=(1, 1))
      )
      (1): Attention(
        (theta): SNConv2d(64, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (phi): SNConv2d(64, 8, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (g): SNConv2d(64, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (o): SNConv2d(32, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      )
    )
    (2): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(64, 128, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (3): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(128, 256, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (4): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (downsample): AvgPool2d(kernel_size=2, stride=2, padding=0)
        (conv1): SNConv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv_sc): SNConv2d(256, 512, kernel_size=(1, 1), stride=(1, 1))
      )
    )
    (5): ModuleList(
      (0): DBlock(
        (activation): ReLU(inplace=True)
        (conv1): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (conv2): SNConv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      )
    )
  )
  (linear): SNLinear(in_features=512, out_features=1, bias=True)
  (embed): SNEmbedding(160, 512)
)
Number of params in G: 8451140 D: 9694562
Loading weights...
Loading weights from weights/BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema...
Inception Metrics will be saved to logs/BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema_log.jsonl
Training Metrics will be saved to logs/BigGAN_I128_hdf5_seed0_Gch32_Dch32_bs110_nDa237_nGa237_Glr1.0e-04_Dlr4.0e-04_Gnlinplace_relu_Dnlinplace_relu_Ginitortho_Dinitortho_Gattn32_Dattn32_Gshared_hier_ema
Using dataset root location data/ILSVRC128.hdf5
Loading data/ILSVRC128.hdf5 into memory...
Using multiepoch sampler from start_itr 200...
Parallelizing Inception module...
Beginning training at epoch 1...

Any idea why this error takes place? Any help will be highly appreciated!

szulm commented 2 years ago

hello ,how did you solve it?