ExtremeFLOW / neko

/ᐠ. 。.ᐟ\ᵐᵉᵒʷˎˊ˗
https://neko.cfd/
Other
159 stars 27 forks source link

Use intrinsic math and add field math operations #1346

Open timfelle opened 3 days ago

timfelle commented 3 days ago

Transition from the manual looped math operations to using intrinsic operators. The interface still have not changed.

Additionally TGV was moved to using field_math as a 5th option and field_math was extended by a few missing operators.

timofeymukha commented 3 days ago

I think it might be best to avoid sum, minval etc., and only stick to the arithmetic operators. I remember reading scary stories about intrinsic functions on the Fortran Discourse forum.

TGV looks much better with the field math.

timofeymukha commented 3 days ago

I dug up the thread: https://fortran-lang.discourse.group/t/automatic-arrays-and-intrinsic-array-operations-to-use-or-not-to-use/4070?page=1

we can maybe check if the discussion is relevant.

MartinKarp commented 2 days ago

This is really clean and I like it, as Timofey said though, are we sure all functions work for all compilers in this way for the CPU math? I think since we pass the length of the array into the functions it should be fine, but if I remember correctly Niclas was uncertain when we started with Neko on doing operations like this from a performance and reliability perspective and that we were sure that no extra temporary array or similar was allocated.

njansson commented 2 days ago

This is really clean and I like it, as Timofey said though, are we sure all functions work for all compilers in this way for the CPU math? I think since we pass the length of the array into the functions it should be fine, but if I remember correctly Niclas was uncertain when we started with Neko on doing operations like this from a performance and reliability perspective and that we were sure that no extra temporary array or similar was allocated.

Agree looks really clean! Temporaries should be fine here (simple vectors) but we should check some compiler listings

njansson commented 2 days ago

Another question is whether we would like to support OpenMP on CPU or not in the future, if so intrinsic will not work

timfelle commented 2 days ago

Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:

njansson commented 2 days ago

Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:

  • Computational efficiency.

    Does this degrade the efficiency or help it. My intuition is that this will improve performance for the individual functions, but we need to verify that chaining them do not kill the gains. So we atleast can discourage it in code.

  • Memory efficiency.

    I think most of these should be fine in regards to memory, I tried to make it way clearer on the intent of variables which should help the compiler make the right decisions. However i think some of these should probably be pure functions but that would remove the interface from the device one.

Agree on the memory. The temporaries are often an issue if one is combining multiple operations, from eg overloading operators in derived types.

I'll still like to add the OpenMP consideration to the above list

timfelle commented 2 days ago

Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:

  • Computational efficiency. Does this degrade the efficiency or help it. My intuition is that this will improve performance for the individual functions, but we need to verify that chaining them do not kill the gains. So we atleast can discourage it in code.
  • Memory efficiency. I think most of these should be fine in regards to memory, I tried to make it way clearer on the intent of variables which should help the compiler make the right decisions. However i think some of these should probably be pure functions but that would remove the interface from the device one.

Agree on the memory. The temporaries are often an issue if one is combining multiple operations, from eg overloading operators in derived types.

I'll still like to add the OpenMP consideration to the above list

Well OpenMP is more of a separate thing isn't it. Wouldn't we do another backend, like we do with devices and sx for that and then enable those at compile time?

njansson commented 2 days ago

Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:

  • Computational efficiency. Does this degrade the efficiency or help it. My intuition is that this will improve performance for the individual functions, but we need to verify that chaining them do not kill the gains. So we atleast can discourage it in code.
  • Memory efficiency. I think most of these should be fine in regards to memory, I tried to make it way clearer on the intent of variables which should help the compiler make the right decisions. However i think some of these should probably be pure functions but that would remove the interface from the device one.

Agree on the memory. The temporaries are often an issue if one is combining multiple operations, from eg overloading operators in derived types. I'll still like to add the OpenMP consideration to the above list

Well OpenMP is more of a separate thing isn't it. Wouldn't we do another backend, like we do with devices and sx for that and then enable those at compile time?

No not really, yes we talked about writing the proper loops in the important parts, but if math like add2 doesn't support OpenMP, user code will be a bottleneck