g_loss is exploded at the initial step

dariopavllo / convmesh

Code for "Convolutional Generation of Textured 3D Meshes", NeurIPS 2020

MIT License

115 stars 18 forks source link

g_loss is exploded at the initial step #8

Closed indigopyj closed 3 years ago

indigopyj commented 3 years ago

I just followed your training step(I didn't change the code at all) and I was training GAN with cub dataset. But I got this result during training epoch 0.

[0] epoch 0, 0/186, g_loss nan d_fake_loss 0.00000 d_real_loss 0.00000 flat 0.24812
[10] epoch 0, 10/186, g_loss nan d_fake_loss nan d_real_loss nan flat nan

I am so confused now because I did change nothing and I just follow your instruction. I only changed the number of gpus, 4 gpus to 3 gpus.

dariopavllo commented 3 years ago

Hi,

There is definitely something wrong, the loss becomes nan after the first update. Make sure that you use a batch size that is a multiple of the number of GPUs.

If the problem still appears, perhaps it's worth checking your kaolin build or the PyTorch setup. Which version are you currently using? (CUDA version as well)

indigopyj commented 3 years ago

I used CUDA 10.1 and pytorch 1.7. But the problem was resolved when I changed 1.7 to 1.6! Thanks anyway!