Fixed the incorrect use of scalers in relation to optim steps
Description
Simply remove both:
optim_g.step()
optim_d.step()
at lines: 433 and 434
Motivation and Context
RVC and Applio use AMP and so, GradScaler and autocast. Hence:
You should not use optim.step() directly when you're working with GradScaler. If you do, you would bypass the necessary gradient scaling and unscaling steps, which can lead to suboptimal training performance and potential instability.
In other words:
It is just simply incorrect and one should use either scaler.step(optim) if AMP/autocast/scaler is in use or optim.step() if there's no plans for mixed precision training.
How has this been tested?
Debugging NaNs, Infs and values/params in gradients of Generator and Discriminator.
Types of changes
[bugfix]
Checklist:
[X ] My code follows the code style of this project.
[ ] My change requires a change to the documentation.
[ ] I have updated the documentation accordingly.
[ ] I have added tests to cover my changes.
[X] All new and existing tests passed. <--- ( not sure exactly what you mean by that lol but I assume it's 'bout testing if the change works.)
Fixed the incorrect use of scalers in relation to optim steps
Description
Simply remove both:
at lines: 433 and 434
Motivation and Context
RVC and Applio use AMP and so, GradScaler and autocast. Hence: You should not use optim.step() directly when you're working with GradScaler. If you do, you would bypass the necessary gradient scaling and unscaling steps, which can lead to suboptimal training performance and potential instability. In other words: It is just simply incorrect and one should use either scaler.step(optim) if AMP/autocast/scaler is in use or optim.step() if there's no plans for mixed precision training.
How has this been tested?
Debugging NaNs, Infs and values/params in gradients of Generator and Discriminator.
Types of changes
[bugfix]
Checklist: