f6f96894b0ad114222f80467ab89b3562adcfb95 in #3 uses std::copy() which is helpful for two reasons:
It assumes that memory regions don't overlap so we don't have a performance penalty from aliasing
It usually resolves to a vectorized (AVX) version of memcpy()
We could probably improve performance across-the-board by using aligned memory everywhere and then annotating our loops (in C++ or Fortran) as such, e.g., with OpenMP's 'this memory is aligned' pragmas.
f6f96894b0ad114222f80467ab89b3562adcfb95 in #3 uses
std::copy()
which is helpful for two reasons:memcpy()
We could probably improve performance across-the-board by using aligned memory everywhere and then annotating our loops (in C++ or Fortran) as such, e.g., with OpenMP's 'this memory is aligned' pragmas.