CExA-project / ddc

DDC is a discrete domain computation library.
https://ddc.mdls.fr
Other
33 stars 5 forks source link

Performance issue with OpenMP deepcopies and/or ChunkSpan[] #193

Open blegouix opened 1 year ago

blegouix commented 1 year ago

There seems to be a performance issue with deepcopies and/or ChunkSpan[] with OpenMP, or the way they interact each other.

The following branch is incredibly faster (like, 1000x) than Gysela/main when compiled with Kokkos_ENABLE_OPENMP=ON :

https://gitlab.maisondelasimulation.fr/gysela-developpers/voicexx/-/compare/main...debug_deepcopy_bracket

In fact, Gysela/main is extremelly slow with OpenMP, whereas all CPU threads are 100% used.

Note : for this demo I use the branch https://github.com/Maison-de-la-Simulation/ddc/pull/181 of DDC to get easy use of ChunkSpan(). The problematic lines are:

blegouix commented 10 months ago

We suspect it to impact only LayoutStride chunks

tpadioleau commented 8 months ago

In order to determine if it is a DDC issue I suggest we compare with the Kokkos equivalent code. If we notice the same behavior then we should close the issue.

blegouix commented 7 months ago

We do not anymore use those kind of combinations of deep_copies and [] anywhere in our codes, and thus performance is back, maybe I can close the issue ? I think users just have to avoid it too.

tpadioleau commented 7 months ago

I don't see a good reason to avoid ddc::deepcopy/Kokkos::deep_copy. We just have shown that in some particular case it was suboptimal. I still think we need to understand why, what was the layout of arrays, their rank and the sizes.