Open blegouix opened 1 year ago
We suspect it to impact only LayoutStride chunks
In order to determine if it is a DDC issue I suggest we compare with the Kokkos equivalent code. If we notice the same behavior then we should close the issue.
We do not anymore use those kind of combinations of deep_copies and [] anywhere in our codes, and thus performance is back, maybe I can close the issue ? I think users just have to avoid it too.
I don't see a good reason to avoid ddc::deepcopy
/Kokkos::deep_copy
. We just have shown that in some particular case it was suboptimal. I still think we need to understand why, what was the layout of arrays, their rank and the sizes.
There seems to be a performance issue with deepcopies and/or ChunkSpan[] with OpenMP, or the way they interact each other.
The following branch is incredibly faster (like, 1000x) than Gysela/main when compiled with
Kokkos_ENABLE_OPENMP=ON
:https://gitlab.maisondelasimulation.fr/gysela-developpers/voicexx/-/compare/main...debug_deepcopy_bracket
In fact, Gysela/main is extremelly slow with OpenMP, whereas all CPU threads are 100% used.
Note : for this demo I use the branch https://github.com/Maison-de-la-Simulation/ddc/pull/181 of DDC to get easy use of ChunkSpan(). The problematic lines are:
ddc::deepcopy(contiguous_slice, allfdistribu[ic][isp][iv]);
ddc::deepcopy(contiguous_slice, allfdistribu[ic][isp][ix]);
ddc::deepcopy(f_vxvy_slice, allfdistribu[isp][ix][iy]);
ddc::deepcopy(vals1, vals[i]);