Add log1p elementwise op

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Apache License 2.0

4.54k stars 363 forks source link

Add log1p elementwise op #993

Open 22quinn opened 7 months ago

22quinn commented 7 months ago

Summary: log1p(x) is more precise than log(1+x) when x is close to 0. We utilize cuda log1pf implementation for fp32. For other precision types, input is first converted to float, then log1pf is computed, finally output is converted back to original precision.

CUDA log1pf function for float and double: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__SINGLE.html

Differential Revision: D54176180