ECP-VeloC / AXL

Asynchronous Transfer Library
MIT License
2 stars 8 forks source link

AXL BBAPI assumes a specific hostname format #100

Closed adammoody closed 3 years ago

adammoody commented 3 years ago

The BBAPI requires one to input a contribId argument in the call to BB_InitLibrary.

https://github.com/IBM/CAST/blob/12d9fdf120e82952cea7bc9575deb6522c491118/bb/include/bbapi.h#L56

extern int BB_InitLibrary(uint32_t contribId, const char* clientVersion);

This is meant to take the global MPI rank of the calling process when using collective transfer operations. Within AXL, we have no information about MPI rank, so instead we pass a "node id" for this value, which is computed by parsing the hostname.

https://github.com/ECP-VeloC/AXL/blob/e5309310d05cccef2e833e358d427285c9ab7beb/src/axl_async_bbapi.c#L172

The parsing logic expects hostname strings to be in the format <name><nodenumber>. However, different HPC sites use different hostname formats, so this approach is not portable.

adammoody commented 3 years ago

Now that we have AXL_Config, another option is to define a new AXL_KEY_CONFIG_RANK, and take that value if set when using BBAPI. That only helps with MPI users, but it would provide a solution for some users like SCR that is portable across HPC sites.

tonyhutter commented 3 years ago

Alternatively, you could getenv() for MPI rank vars and use their value if available:

Intel - $PMI_RANK
OpenMPI - $OMPI_COMM_WORLD_RANK
MVAPICH2 - $MV2_COMM_WORLD_RANK