cswry / OSEDiff

[NeurlPS2024] One-Step Effective Diffusion Network for Real-World Image Super-Resolution
Apache License 2.0
224 stars 13 forks source link

VSD loss NaN #44

Closed zzzzzero closed 4 weeks ago

zzzzzero commented 1 month ago

Thank you for open-sourcing the training code. When I used the training code, I noticed that when using VSD loss with lambda_vsd set to 1, the loss becomes NaN after about 2000 steps. However, when I don't use this part of the loss and set lambda_vsd to 0, the loss and results are normal. I wanted to ask if you encountered this issue during the training process.

Kiteretsu77 commented 1 month ago

I face the same issue.

cswry commented 4 weeks ago
20241030222851195

Hello, I ran the OSEDiff code from GitHub directly, and even after 10,000 iterations, I didn't encounter any NaN issues with the loss.

It might be caused by differences in environment versions. Could you check the versions of the relevant packages you have installed?

Kiteretsu77 commented 4 weeks ago

I changed the version back to original defined SD2.1-Base (from SD2.1 without Base) and made some small modifications (like making lr smaller), it becomes good now! Thank you very much!

zzzzzero commented 4 weeks ago

I changed the version back to original defined SD2.1-Base (from SD2.1 without Base) and made some small modifications (like making lr smaller), it becomes good now! Thank you very much!

I made the same mistake as you, using sd21 instead of sd21-base. I suspect that sd21 does not generate 512x512 images very well, which caused the issues with VSD loss. I'm now using sd21-base, and the loss and results seem normal. Thank you for your timely responses!