Open ChaiBapchya opened 4 years ago
@ChuyangDeng @laurenyu
Discussed with @ChaiBapchya offline, https://github.com/aws/sagemaker-training-toolkit/blob/master/src/sagemaker_training/mpi.py#L43 might be what he needs.
However, it's difficult to find an example of the usage of args
Documentation part is missing. Can someone help with adding that? Maybe on-call?
Describe the feature you'd like Pass arguments to the training script while using Horovod via MPI for Distributed training.
Current Situation
OnlyProcessRunner supports passing hyperparameters https://github.com/aws/sagemaker-training-toolkit/blob/c357433d6fdbc43a896b25bd126c46f689ddb73c/src/sagemaker_training/process.py#L105-L109MPIRunner doesn't support it.MPIRunner supports it: https://github.com/aws/sagemaker-training-toolkit/blob/c357433d6fdbc43a896b25bd126c46f689ddb73c/src/sagemaker_training/mpi.py#L41-L45How would this feature be used? Please describe. Example API would be
Where entry-point script is
hvd_resnet_mx.sh
Describe alternatives you've considered
Right now, one has to use ProcessRunner instead of MPIRunner to pass bash script for training