ajbrock / BigGAN-PyTorch

The author's officially unofficial PyTorch BigGAN implementation.
MIT License
2.84k stars 470 forks source link

Training with ImageNet 64x64 #86

Open gulperii opened 3 years ago

gulperii commented 3 years ago

Hello,

I am using ImageNet 64x64 and run the code with the following command : python train.py --dataset I64_hdf5 --shuffle --batch_size 128 --num_G_accumulations 1 --num_D_accumulations 1 --num_D_steps 1 --G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 --G_attn 32 --D_attn 32 --G_nl relu --D_nl relu --SN_eps 1e-8 --BN_eps 1e-5 --adam_eps 1e-8 --G_ortho 0.0 --G_init xavier --D_init xavier --G_eval_mode --G_ch 32 --D_ch 32 --ema --use_ema --ema_start 2000 --test_every 5000 --save_every 1000 --num_best_copies 5 --num_save_copies 2 --seed 0 --which_best FID --num_epochs 1000 --num_workers 8 --parallel

and getting this error:

File "train.py", line 229, in <module>
    main()
  File "train.py", line 226, in main
    run(config)
  File "train.py", line 184, in run
    metrics = train(x, y)
  File "/BigGAN-PyTorch/train_fns.py", line 42, in train
    split_D=config['split_D'])
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 140, in forward
    return self.module(*inputs, **kwargs)
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/BigGAN-PyTorch/BigGAN.py", line 443, in forward
    D_out = self.D(D_input, D_class)
  File "/miniconda3/envs/biggan2-env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/BigGAN-PyTorch/BigGAN.py", line 403, in forward
    out = out + torch.sum(self.embed(y) * h, 1, keepdim=True)
RuntimeError: CUDA error: device-side assert triggered

I have used the prepare_data script in the repository as follows:

python make_hdf5.py --dataset I64 --batch_size 256 --data_root data
python calculate_inception_moments.py --dataset I64_hdf5 --data_root data

The interesting thing is when I create a "mini dataset" by randomly selecting 500 images per label from original ImageNet dataset code runs fine. What could be the problem? How can I solve this issue ?

a28293971 commented 1 year ago

CUDA error: device-side assert triggered such ERR, it is best to transfer the model to the CPU to see the detailed ERR message.