E3SM-Project / EKAT

Tools and libraries for writing Kokkos-enabled HPC C++ in E3SM ecosystem
Other
15 stars 7 forks source link

Implement specialized templates for pow<2>, pow<3>, pow<4>. #44

Open jeff-cohere opened 4 years ago

jeff-cohere commented 4 years ago

Is your feature request related to a problem? Please describe. In order to make bit-for-bit testing easier, it would be nice to have specialized implementations of the pow function for low integer exponents. In particular, see this conversation.

Describe the solution you'd like C++ template specializations for pow<2>, pow<3>, and pow<4>. Perhaps we should include some Fortran support for these and other functions in EKAT as well.

bartgol commented 4 years ago

We can probably do a template utility for the generic pow<N> (log2(N) recursions). It should be fairly straightforward.

jeff-cohere commented 4 years ago

Do you have any HOMME code or other prior art we can use? Or do you have a new implementation in mind?

bartgol commented 4 years ago

I have an impl for a runtime version; should be immediate to convert to templated (or even add both).

Btw, the bfb_pow_impl function in that file is, imho, a better solution for bfb pow than bridging F90 to Cuda. One might argue that it is expensive, but it might be a wash with the Cuda kernel launch (I never checked though).