eps696 / stargan2

StarGAN2 for practice
Other
92 stars 11 forks source link

NaN on RTX 3090 GPU training #6

Open pjh-nbut opened 1 month ago

pjh-nbut commented 1 month ago

I have been attempting to train the StarGAN v2 model using an RTX 3090 GPU, but I encounter a persistent issue where the loss values turn to NaN, and the generated images are solely composed of noise. This is puzzling, especially since the same code and data work flawlessly on a Tesla P100 GPU.

Could you please shed some light on what might be causing these discrepancies between the two GPUs? Any insights or suggestions you could provide would be immensely appreciated, as I am quite perplexed by this behavior.

eps696 commented 1 month ago

alas, i'm not an expert in GPU details, so can't guess out of the blue what may be the difference. and that's too little info to guess anything. is it on colab or local? what python/torch versions are used? did you try other data on 3090?