facebookresearch / pytorch_GAN_zoo

A mix of GAN implementations including progressive growing
BSD 3-Clause "New" or "Revised" License
1.61k stars 270 forks source link

A bug in save_iter (`-s`) flag #135

Open jyu-theartofml opened 2 years ago

jyu-theartofml commented 2 years ago

I am training a PGAN model on a dataset, and at scale 0 (4x4) everything was working as expected. At scale 1 (8x8), there seems to be something odd with the model training where it was not displaying any loss value but it just displays the 'changing alpha', and it looks like it's stuck and not saving the checkpoint if the -s value is 100 or greater. It would just run for a looong time and not do anything aside from displaying the alpha.

However, if I change the -s value for something less than 100 (e.g. 50), then it saves the checkpoints. But the way it's saving it is weird too - it goes 50, 150, 250, 350, etc. So it's not every 50 checkpoint.

Here's the code for reference

(pytorch_p36) ubuntu@ip-172-31-45-191:~/dcgan/pytorch_GAN_zoo$ python train.py PGAN -c config_celebaHQ.json  -n portraits -d checkpoint -s 100
Setting up a new session...
Running PGAN
size 10
9851 images found
9851 images detected
Model found at path checkpoint/portraits/portraits_s0_i48000.pt, pursuing the training
Average network found !
size (4, 4)
9851 images found
size (8, 8)
9851 images found
Changing alpha to 1.000
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/functional.py:2941: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.
  warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.")
/home/ubuntu/dcgan/pytorch_GAN_zoo/models/base_GAN.py:278: UserWarning: This overload of add_ is deprecated:
    add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
    add_(Tensor other, *, Number alpha) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:766.)
  avg_p.mul_(0.999).add_(0.001, p.data)
Changing alpha to 0.998
Changing alpha to 0.997
Changing alpha to 0.995
Changing alpha to 0.993
Changing alpha to 0.992
Changing alpha to 0.990
Changing alpha to 0.988
Changing alpha to 0.987
Changing alpha to 0.985
Changing alpha to 0.983
Changing alpha to 0.982
Changing alpha to 0.980
Changing alpha to 0.978
Changing alpha to 0.977
Changing alpha to 0.975
Changing alpha to 0.973
Changing alpha to 0.972
Changing alpha to 0.970
Changing alpha to 0.968
Changing alpha to 0.967
Changing alpha to 0.965
Changing alpha to 0.963
Changing alpha to 0.962
Changing alpha to 0.960
Changing alpha to 0.958
Changing alpha to 0.957
Changing alpha to 0.955
Changing alpha to 0.953
Changing alpha to 0.952
Changing alpha to 0.950
Changing alpha to 0.948
Changing alpha to 0.947
Changing alpha to 0.945
Changing alpha to 0.943
Changing alpha to 0.942
Changing alpha to 0.940
Changing alpha to 0.938
Changing alpha to 0.937
Changing alpha to 0.935
Changing alpha to 0.933
Changing alpha to 0.932
Changing alpha to 0.930
Changing alpha to 0.928
Changing alpha to 0.927
Changing alpha to 0.925
Changing alpha to 0.923
Changing alpha to 0.922
Changing alpha to 0.920
jyu-theartofml commented 2 years ago

Update: I switched back to the old master branch and this issue is resolved.