ajbrock / BigGAN-PyTorch

The author's officially unofficial PyTorch BigGAN implementation.
MIT License
2.84k stars 470 forks source link

Trouble training it from scratch #92

Open ParthaEth opened 2 years ago

ParthaEth commented 2 years ago

I used this command python train.py --dataset I128_hdf5 --parallel --shuffle --num_workers 8 --batch_size 256 --load_in_mem --num_G_accumulations 8 --num_D_accumulations 8 --num_D_steps 1 --G_lr 1e-4 --D_lr 4e-4 --D_B2 0.999 --G_B2 0.999 --G_attn 64 --D_attn 64 --G_nl inplace_relu --D_nl inplace_relu --SN_eps 1e-6 --BN_eps 1e-5 --adam_eps 1e-6 --G_ortho 0.0 --G_shared --G_init ortho --D_init ortho --hier --dim_z 120 --shared_dim 128 --G_eval_mode --G_ch 96 --D_ch 96 --ema --use_ema --ema_start 20000 --test_every 2000 --save_every 1000 --num_best_copies 5 --num_save_copies 2 --seed 0 --use_multiepoch_sampler --data_root '../../datasets/imagenet_2012/ILSVRC/Data/CLS-LOC'

and the following is my terminal dump. Clearly, the FID is not converging

{"itr": 2000, "IS_mean": 1.0304653644561768, "IS_std": 0.0014261072501540184, "FID": 394.4552307128906, "_stamp": 1643744125.9338467}
{"itr": 4000, "IS_mean": 1.1188709735870361, "IS_std": 0.001007376005873084, "FID": 441.09441643261874, "_stamp": 1643771589.7811866}
{"itr": 6000, "IS_mean": 1.1406240463256836, "IS_std": 0.002446718281134963, "FID": 336.57989501953125, "_stamp": 1643799135.0760667}
{"itr": 8000, "IS_mean": 1.0835497379302979, "IS_std": 0.002129267668351531, "FID": 319.775146484375, "_stamp": 1643826470.6842303}
{"itr": 10000, "IS_mean": 1.1007851362228394, "IS_std": 0.0018776928773149848, "FID": 378.22900390625, "_stamp": 1643853959.790462}
{"itr": 12000, "IS_mean": 1.2996935844421387, "IS_std": 0.004443394020199776, "FID": 346.31591796875, "_stamp": 1643881555.2079446}
{"itr": 14000, "IS_mean": 1.234755039215088, "IS_std": 0.0016474281437695026, "FID": 302.0063781738281, "_stamp": 1643908870.9288065}
{"itr": 16000, "IS_mean": 1.0296392440795898, "IS_std": 0.0007893913425505161, "FID": 345.8172302246094, "_stamp": 1643936364.6364923}
{"itr": 18000, "IS_mean": 1.1954978704452515, "IS_std": 0.0018338472582399845, "FID": 342.95867919921875, "_stamp": 1643964062.272396}

Any pointer to fix this? I ran this on a 8V100 - 32GB machine