aws-samples / awsome-distributed-training

Collection of best practices, reference architectures, model training examples and utilities to train large models on AWS.
MIT No Attribution
177 stars 74 forks source link

NCCL Test Script for AMI #329

Closed sean-smith closed 4 months ago

sean-smith commented 4 months ago

For example, to run on AWS ParallelCluster:

sbatch nccl-tests-ami.sbatch /opt/nccl-tests/build/all_reduce_perf /opt/nccl/build/lib

This passes in the path to the binary as argument 1 and any additional LD_LIBRARY_PATH paths to argument 2.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.