charmplusplus / charm

The Charm++ parallel programming system. Visit https://charmplusplus.org/ for more information.
Apache License 2.0
199 stars 50 forks source link

feature: support CXI in OFI for Slingshot-11 #3791

Closed ericjbohm closed 2 months ago

ericjbohm commented 3 months ago

This supports the CXI interface for Cassini (AKA Slingshot-11) within the OFI machine layer. No application level changes are required, but a variety of command line options are provided to configure the memory pool and the selection of cxi interfaces.

Note: this does require the use of the memory pool in order to efficiently support the FI_MR_ENDPOINT mode of memory registration. As the time cost of registering individual messages would be otherwise be entirely too high.

Charmrun has been configured to wrap srun and currently assumes PMI2 with cray extensions for launching.

The build system has been set up to autodetect CXI and enable support for it accordingly. For compatibility purposes, it also supports the use of cxi on the build line, but that should not be necessary on most HPE systems with proper LMOD environments.

ericjbohm commented 2 months ago

The last machine that I had access to with regular OFI was turned off last year. I could try it over TCP/IP somewhere, but not sure how useful the resulting information would be.

On Fri, Apr 12, 2024 at 1:50 PM Sam White @.***> wrote:

@.**** commented on this pull request.

Any documentation updates needed?

Has non-CXI OFI been tested and benchmarked for performance with the changes to use the LRTS mempool and the ofi request cache?

In src/arch/ofi/conv-common.h https://github.com/UIUC-PPL/charm/pull/3791#discussion_r1563029131:

/*

  • Use Simple client-side implementation of PMI.
  • Valid only for CMK_USE_PMI.
  • Optional in an SLURM environment.
  • See src/arch/util/proc_management/simple_pmi/ */ -#define CMK_USE_SIMPLEPMI 1 +#define CMK_USE_SIMPLEPMI 0

Does this get set somewhere else now in the default OFI case?

— Reply to this email directly, view it on GitHub https://github.com/UIUC-PPL/charm/pull/3791#pullrequestreview-1998230022, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3HFHYNFRL7VAFMZ56YGBTY5AUHJAVCNFSM6AAAAABFHVV5UGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTSOJYGIZTAMBSGI . You are receiving this because you authored the thread.Message ID: @.***>