fac2003 / perceiver-multi-modality-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
MIT License
37 stars 4 forks source link

Attention cast floats #8

Closed fcampagnexandr closed 3 years ago

fcampagnexandr commented 3 years ago

Cast k andf values to float, cast back output to type of input, avoid instability as attention weights grow during training, per https://twitter.com/tsuname/status/1430653484827697155?s=20