Closed ChaiBapchya closed 4 years ago
@ChuyangDeng FYI
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
@ChuyangDeng plz help with review. Thanks.
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Integ test would require use of Sagemaker Python SDK for which we have an open PR https://github.com/aws/sagemaker-python-sdk/pull/1581
Can integ test be added later? or is there a way to add integration test without leveraging PythonSDK?
Your Python SDK change is simply sending over a hyperparameter, so you should be able to add an integ test here regardless of the Python SDK change. Here's an example with using parameter servers: https://github.com/aws/sagemaker-mxnet-training-toolkit/blob/master/test/integration/sagemaker/test_training.py#L30
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
LGTM. Skipping the generic horovod test following the practice of tensorflow-training-toolkit: https://github.com/aws/sagemaker-tensorflow-training-toolkit/blob/master/test/integration/local/test_horovod.py#L26
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
LGTM. Skipping the generic horovod test following the practice of tensorflow-training-toolkit: https://github.com/aws/sagemaker-tensorflow-training-toolkit/blob/master/test/integration/local/test_horovod.py#L26
Created an issue to track it: https://github.com/aws/sagemaker-mxnet-training-toolkit/issues/180
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Description of changes: Adds MPI support to distributed training
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.