bmm for style loss - Githubissues

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD 3-Clause "New" or "Revised" License

8.43k stars 1.4k forks source link

In many applications we need bmm for gram matrix calculation, like in neural style. However it seems gram matrix with 01 mode will always give NaN.

See the issue here https://github.com/pytorch/pytorch/issues/3651

I encountered the same problem with apex. It seems that input should not be casted to fp16 in this case. This happens because input matrcies can contain large values. A good solution would be scaleing them down before bmm and multiply back after bmm solves the problem.

This approach should be implemented internally since users don't know if need to manual scale in most cases

NVIDIA / apex

bmm for style loss #402