What did you find confusing? Please describe.
I was searching for documentation regarding distributed training with own docker containers. The current documentation explains how to create containers or extend them to be able to use distributed training with the required modules installation guide , but its does not provide information on how to configure the Estimator class or any other launch parameters to start the distributed training as it does for PyTorch or Tensorflow classes.
What did you find confusing? Please describe. I was searching for documentation regarding distributed training with own docker containers. The current documentation explains how to create containers or extend them to be able to use distributed training with the required modules installation guide , but its does not provide information on how to configure the Estimator class or any other launch parameters to start the distributed training as it does for PyTorch or Tensorflow classes.
Describe how documentation can be improved Add text that describe how to launch the distributed training after creating or extending the docker image. Do it at these sections: https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api (here is a typo in the link that you should also fix, skd instead of sdk) https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-bring-your-own-container