Open timfelle opened 3 days ago
I think it might be best to avoid sum
, minval
etc., and only stick to the arithmetic operators. I remember reading scary stories about intrinsic functions on the Fortran Discourse forum.
TGV looks much better with the field math.
I dug up the thread: https://fortran-lang.discourse.group/t/automatic-arrays-and-intrinsic-array-operations-to-use-or-not-to-use/4070?page=1
we can maybe check if the discussion is relevant.
This is really clean and I like it, as Timofey said though, are we sure all functions work for all compilers in this way for the CPU math? I think since we pass the length of the array into the functions it should be fine, but if I remember correctly Niclas was uncertain when we started with Neko on doing operations like this from a performance and reliability perspective and that we were sure that no extra temporary array or similar was allocated.
This is really clean and I like it, as Timofey said though, are we sure all functions work for all compilers in this way for the CPU math? I think since we pass the length of the array into the functions it should be fine, but if I remember correctly Niclas was uncertain when we started with Neko on doing operations like this from a performance and reliability perspective and that we were sure that no extra temporary array or similar was allocated.
Agree looks really clean! Temporaries should be fine here (simple vectors) but we should check some compiler listings
Another question is whether we would like to support OpenMP on CPU or not in the future, if so intrinsic will not work
Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:
pure functions
but that would remove the interface from the device one.Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:
Computational efficiency.
Does this degrade the efficiency or help it. My intuition is that this will improve performance for the individual functions, but we need to verify that chaining them do not kill the gains. So we atleast can discourage it in code.
Memory efficiency.
I think most of these should be fine in regards to memory, I tried to make it way clearer on the intent of variables which should help the compiler make the right decisions. However i think some of these should probably be
pure functions
but that would remove the interface from the device one.
Agree on the memory. The temporaries are often an issue if one is combining multiple operations, from eg overloading operators in derived types.
I'll still like to add the OpenMP consideration to the above list
Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:
- Computational efficiency. Does this degrade the efficiency or help it. My intuition is that this will improve performance for the individual functions, but we need to verify that chaining them do not kill the gains. So we atleast can discourage it in code.
- Memory efficiency. I think most of these should be fine in regards to memory, I tried to make it way clearer on the intent of variables which should help the compiler make the right decisions. However i think some of these should probably be
pure functions
but that would remove the interface from the device one.Agree on the memory. The temporaries are often an issue if one is combining multiple operations, from eg overloading operators in derived types.
I'll still like to add the OpenMP consideration to the above list
Well OpenMP is more of a separate thing isn't it. Wouldn't we do another backend, like we do with devices and sx for that and then enable those at compile time?
Well this gives us an opportunity to verify a range of these things. So far i would say two things are important here:
- Computational efficiency. Does this degrade the efficiency or help it. My intuition is that this will improve performance for the individual functions, but we need to verify that chaining them do not kill the gains. So we atleast can discourage it in code.
- Memory efficiency. I think most of these should be fine in regards to memory, I tried to make it way clearer on the intent of variables which should help the compiler make the right decisions. However i think some of these should probably be
pure functions
but that would remove the interface from the device one.Agree on the memory. The temporaries are often an issue if one is combining multiple operations, from eg overloading operators in derived types. I'll still like to add the OpenMP consideration to the above list
Well OpenMP is more of a separate thing isn't it. Wouldn't we do another backend, like we do with devices and sx for that and then enable those at compile time?
No not really, yes we talked about writing the proper loops in the important parts, but if math like add2 doesn't support OpenMP, user code will be a bottleneck
Transition from the manual looped math operations to using intrinsic operators. The interface still have not changed.
Additionally TGV was moved to using field_math as a 5th option and field_math was extended by a few missing operators.