Open jeff-cohere opened 4 years ago
We can probably do a template utility for the generic pow<N>
(log2(N) recursions). It should be fairly straightforward.
Do you have any HOMME code or other prior art we can use? Or do you have a new implementation in mind?
I have an impl for a runtime version; should be immediate to convert to templated (or even add both).
Btw, the bfb_pow_impl function in that file is, imho, a better solution for bfb pow than bridging F90 to Cuda. One might argue that it is expensive, but it might be a wash with the Cuda kernel launch (I never checked though).
Is your feature request related to a problem? Please describe. In order to make bit-for-bit testing easier, it would be nice to have specialized implementations of the
pow
function for low integer exponents. In particular, see this conversation.Describe the solution you'd like C++ template specializations for
pow<2>
,pow<3>
, andpow<4>
. Perhaps we should include some Fortran support for these and other functions in EKAT as well.