eth-cscs / stackinator

https://eth-cscs.github.io/stackinator/
BSD 3-Clause "New" or "Revised" License
19 stars 15 forks source link

link to xpmem #169

Closed simonpintarelli closed 4 months ago

simonpintarelli commented 4 months ago

use patchelf --add-needed to link libxpmem.so to mpi libraries.

RMeli commented 4 months ago

FYI, @msimberg noticed performance degradation for DLA-Future on LUMI compared to Alps and narrowed down the issue to LUMI using XPMEM for intra-node communication.

Workaround is to either avoid linking with xpmem (in which case MPI will use it automatically) or explicitly request the fallback CMA mode with MPICH_SMP_SINGLE_COPY_MODE=CMA.

He might have some additional details.

msimberg commented 4 months ago

Would it make sense to make this a variant? I'd be 100% ok even with having it on by default, but it'd give us the option to disable it already in the spack recipe if we find that it causes a degradation also on clariden. I think this PR is also important to get people to test what xpmem does to their application performance.

That is to say: no objections at all to merging this.

HPE and LUMI engineers are aware that we see a degradation with DLA-Future, and have offered to look into the problem.

simonpintarelli commented 4 months ago

@msimberg thanks for the input. Didn't know about the LUMI case. CPE also links xpmem, we missed this previously. It can be selected at runtime using MPICH_SMP_SINGLE_COPY_MODE. I think it's not necessary to make it a variant.

On clariden (I tested eiger), it's the reverse, performance for osu_bw much better if xpmem is linked (and used by default).