Open 22quinn opened 7 months ago
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
This pull request was exported from Phabricator. Differential Revision: D54176180
Summary:
log1p(x)
is more precise thanlog(1+x)
whenx
is close to 0. We utilize cudalog1pf
implementation for fp32. For other precision types, input is first converted to float, thenlog1pf
is computed, finally output is converted back to original precision.CUDA log1pf function for float and double: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__SINGLE.html
Differential Revision: D54176180