NaN values in gradients

NVlabs / NVAE

The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)

Other

999 stars 163 forks source link

Hi, getting no gradient for num_latent_scale=1 and num_groups_per_scale=1 is weird. By no gradients, do you mean that the gradients were zero or None? If they were zero, do you see any changes after some time of training?

Getting NaN in gradient is natural especially at the beginning of the training. We are using mixed precision which means that most operations are cast to FP16. Because of the lower precision, we may get NaN easily and it's autocast and grad_scalar's job to drop these gradients and scale the loss such that we don't get NaN.

You can disable mixed-precision by supplying enabled=False to autocast() at this line: https://github.com/NVlabs/NVAE/blob/38eb9977aa6859c6ee037af370071f104c592695/train.py#L163

NVlabs / NVAE

NaN values in gradients #29