ajbrock / BigGAN-PyTorch

The author's officially unofficial PyTorch BigGAN implementation.
MIT License
2.84k stars 470 forks source link

IndexError: tuple index out of range #70

Open JanineCHEN opened 4 years ago

JanineCHEN commented 4 years ago

Hey, I am a student trying to reproduce the training process using my own dataset. I got the following error right after the first Epoch of training is finished:

Traceback (most recent call last):
  File "train.py", line 227, in <module>
    main()
  File "train.py", line 224, in main
    run(config)
  File "train.py", line 184, in run
    metrics = train(x, y)
  File "/home/projects/BIGGAN/train_fns.py", line 41, in train
    x[counter], y[counter], train_G=False, 
IndexError: tuple index out of range

I execute the training using sh scripts/launch_BigGAN_bs256x8.sh with my own dataset, the dataset has been transformed into HDF5 format without any error. The content of launch_BigGAN_bs256x8.sh I used:

#!/bin/bash
python train.py \
--dataset I128_hdf5 --parallel --shuffle  --num_workers 8 --batch_size 128 --load_in_mem  \
--num_G_accumulations 16 --num_D_accumulations 16 \
--num_D_steps 1 --G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 \
--G_attn 64 --D_attn 64 \
--G_nl inplace_relu --D_nl inplace_relu \
--SN_eps 1e-6 --BN_eps 1e-5 --adam_eps 1e-6 \
--G_ortho 0.0 \
--G_shared \
--G_init ortho --D_init ortho \
--hier --dim_z 120 --shared_dim 128 \
--G_eval_mode \
--which_best FID \
--G_ch 32 --D_ch 32 \
--ema --use_ema --ema_start 20000 \
--test_every 200 --save_every 100 --num_best_copies 5 --num_save_copies 2 --seed 0 \
--use_multiepoch_sampler \

I am not sure if this has something to do with the size of my dataset or number of classes? If so, how could I adjust the parameters? Or any other idea why this issue comes into place and how to tackle it? Any help would be very much appreciated! Thanks a bunch in advance.

Baran-phys commented 3 years ago

What was the solution?

JanineCHEN commented 3 years ago

What was the solution?

Hi, it was the residual batch that caused the problem, you can either drop_last when constructing the dataloader or increase the number of epochs to avoid using the last batch.

Baran-phys commented 3 years ago

Well, neither of them were solving this error on my side. I get this error when I use num_G_accumulations or num_D_accumulations more than 2.

datduong commented 3 years ago

I use drop_last and it works. I am using 4 GPU and batch size 52.