ExtremeFLOW / neko

/ᐠ. 。.ᐟ\ᵐᵉᵒʷˎˊ˗
https://neko.cfd/
Other
158 stars 27 forks source link

Generic math interface #706

Open njansson opened 1 year ago

njansson commented 1 year ago

Add a generic math interface for all kinds of backends.

njansson commented 1 year ago

The generic interface prevented inline of the small leaf routines in math.f90, thus gave a quite large performance hit. Remove from v0.5.0 (https://github.com/ExtremeFLOW/neko/projects/21).

timofeymukha commented 1 year ago

@njansson As a small comment, we mostly used cpu and not host as far as naming goes so far, like in bcknd folders.

timofeymukha commented 1 year ago

@njansson I wonder if we can also use assumed shape arrays in math, so that one does not have to specify the size in stuff like rzero. I guess the size is always the whole array. I just had a very nasty bug, where I passed a value, which was to high, and the array was a part of a type. So it went over to other attributes in memory and zeroed them out :P. Was pretty damn hard to find.

timofeymukha commented 2 months ago

@njansson How about a poor man's generic interface, which is just

subroutine rzero(x, n)

if (NEKO_BCKND_DEVICE .eq. 1) then
   x_d = device_get_ptr(x)
   call device_rzero(x_d, n)
else
   call cpu_rzero(x, n)
end
end

This should not affect performance I suppose since this is exactly what we do all over the code? It would still clean up a lot of if-statements!

njansson commented 2 months ago

@njansson How about a poor man's generic interface, which is just

subroutine rzero(x, n)

if (NEKO_BCKND_DEVICE .eq. 1) then
   x_d = device_get_ptr(x)
   call device_rzero(x_d, n)
else
   call cpu_rzero(x, n)
end
end

This should not affect performance I suppose since this is exactly what we do all over the code? It would still clean up a lot of if-statements!

Since device_** calls are quite expensive due to its overhead vs compute time, the additional pointer lookup adds a bit too much I would say. Furthermore, for CPU backends it the if statement will introduce the same issue with inlining as the generic interfaces :( This can of course be solved by writing out the loops where we would like inlining (which also allow for fusing). Then we add the generic interface without too much performance loss across the backends, and also add openmp for cpus.

njansson commented 2 months ago

Maybe a bit clearer, I would suggest we take the approach of feature/omp for the main part of our current math usage. Add the generic interface, which would then only be used in "less" performance critical part of the code. This would also be a great improvement for the user when writing user files