aws / aws-k8s-tester

AWS Kubernetes tester, kubetest2 deployer implementation
Apache License 2.0
163 stars 82 forks source link

Config MPI4 for EFA #497

Closed Issacwww closed 1 week ago

Issacwww commented 1 week ago

Issue #, if available:

Description of changes: MPI5 runinto below issue on AL2 & AL23

[multi-node-nccl-test-launcher:00001] PMIX ERROR: PMIX_ERR_BAD_PARAM in file gds_utils.c at line 308
[multi-node-nccl-test-launcher:00001] PMIX ERROR: PMIX_ERR_BAD_PARAM in file gds_hash.c at line 496
[multi-node-nccl-test-launcher:00001] PRTE ERROR: Bad parameter in file base/odls_base_default_fns.c at line 768

MPI4 is recommended https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start-nccl.html By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.