I found that we can prevent NaNs in gradients that occur only in 32-bit mode by modifying the order of calculations.
This may be because it is better to perform computation with 64-bit variables before applying 32-bit variables to minimize rounding errors.
I found that we can prevent NaNs in gradients that occur only in 32-bit mode by modifying the order of calculations. This may be because it is better to perform computation with 64-bit variables before applying 32-bit variables to minimize rounding errors.
Fixes #473