elvisyjlin / AttGAN-PyTorch

AttGAN PyTorch Arbitrary Facial Attribute Editing: Only Change What You Want
MIT License
248 stars 61 forks source link

how do you triain on celeba-hq? #5

Closed thinkerthinker closed 5 years ago

thinkerthinker commented 5 years ago

when i train attgan on celeba-hq, I found the loss is very strange, for example, 39%|▍| 684/1750 [26:59<39:02, 2.20s/it, d_loss=3.62e+10, epoch=0, g_loss=5.11e+5, iter=684] by visualizing the curves of loss, df_gp, df_loss, g_loss, gf_loss is very high, and they still increases. Are you re-adjusting the coefficients of each loss? Or what tricks did you use?

elvisyjlin commented 5 years ago

Here is my training history on CelebA-HQ 256x256 (256_shortcut1_inject1_none_hq):

screen shot 2019-03-07 at 2 08 49 pm screen shot 2019-03-07 at 2 09 15 pm

At the first several hundred steps, d_loss was very small (negative) whilst g_loss started from a very large value (positive). Apparently, they converged at about 5k steps. I trained this model, the one you can download from the Google Drive, with the default setting in the readme.

CUDA_VISIBLE_DEVICES=0 python3 train.py --data CelebA-HQ --img_size 256 --shortcut_layers 1 --inject_layers 1 --experiment_name 256_shortcut1_inject1_none_hq --gpu

What is your training setting? How does your training history look like?

thinkerthinker commented 5 years ago

Thank you. I found that the loss divergence was caused by the use of multiple gpu. If I only use one gpu, the training process is correct. I think I might be having problems with Dataparallel。

elvisyjlin commented 5 years ago

Thank you, too. You reminded me the multi GPU problem which I also encountered recently.

Briefly speaking, the autograd to compute gradient penalty is accidently deleted when the function gradient_penalty() is returned. This is a bug since PyTorch 1.0.0 released. See this issue.

You can try multi GPU training with the code I fixed just now.

thinkerthinker commented 5 years ago

Your answer has helped me a lot, thank you again for your reply.

elvisyjlin commented 5 years ago

You're welcome ^^