Open njansson opened 1 year ago
The generic interface prevented inline of the small leaf routines in math.f90
, thus gave a quite large performance hit.
Remove from v0.5.0 (https://github.com/ExtremeFLOW/neko/projects/21).
@njansson As a small comment, we mostly used cpu
and not host
as far as naming goes so far, like in bcknd
folders.
@njansson I wonder if we can also use assumed shape arrays in math
, so that one does not have to specify the size in stuff like rzero
. I guess the size is always the whole array. I just had a very nasty bug, where I passed a value, which was to high, and the array was a part of a type. So it went over to other attributes in memory and zeroed them out :P. Was pretty damn hard to find.
@njansson How about a poor man's generic interface, which is just
subroutine rzero(x, n)
if (NEKO_BCKND_DEVICE .eq. 1) then
x_d = device_get_ptr(x)
call device_rzero(x_d, n)
else
call cpu_rzero(x, n)
end
end
This should not affect performance I suppose since this is exactly what we do all over the code? It would still clean up a lot of if-statements!
@njansson How about a poor man's generic interface, which is just
subroutine rzero(x, n) if (NEKO_BCKND_DEVICE .eq. 1) then x_d = device_get_ptr(x) call device_rzero(x_d, n) else call cpu_rzero(x, n) end end
This should not affect performance I suppose since this is exactly what we do all over the code? It would still clean up a lot of if-statements!
Since device_**
calls are quite expensive due to its overhead vs compute time, the additional pointer lookup adds a bit too much I would say. Furthermore, for CPU backends it the if statement will introduce the same issue with inlining as the generic interfaces :( This can of course be solved by writing out the loops where we would like inlining (which also allow for fusing). Then we add the generic interface without too much performance loss across the backends, and also add openmp for cpus.
Maybe a bit clearer, I would suggest we take the approach of feature/omp
for the main part of our current math
usage. Add the generic interface, which would then only be used in "less" performance critical part of the code. This would also be a great improvement for the user when writing user files
Add a generic math interface for all kinds of backends.