Open tks2004 opened 7 months ago
Applications request GPU direct capability from Libfabric by adding the FI_HMEM
flag when calling fi_getinfo
, as the plugin does here: https://github.com/aws/aws-ofi-nccl/blob/e704fd9dbd0620905a1b900d3d280f9e50daee10/src/nccl_ofi_net.c#L374
Before Libfabric 1.18, the Libfabric EFA provider also required an environment variable, FI_EFA_USE_DEVICE_RDMA=1
, to enable GPU direct. For Libfabric 1.18+ and Aws-ofi-nccl 1.7.0+, this is no longer required. See also: https://github.com/aws/aws-ofi-nccl/blob/master/doc/efa-env-var.md, mostly relevant to EFA provider.
If we need to enable GPU direct, is there any FI environment to be enabled to utilize that feature.