NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

[PT2] Normalisation: use manual impl when compiling #1854

Closed alexdremov closed 3 weeks ago

alexdremov commented 3 weeks ago

Using kernel triggers recompilations due to the superfluous guards torch generates for custom kernels. Executing manual implementation should be better

alexdremov commented 3 weeks ago

@crcrpar merging?