training and inference run scripts support STS endpoints.

verdimrc commented 3 years ago

🐛 Bug report

[x] I have checked that this issue has not already been reported.

Describe the bug

Without public internet, but with STS endpoint setup, both inference_pipeline_run.py and training_pipeline_run.py will fail due to http timeout.

This can be fix by hardcoding the endpoint as follows:

inference_pipeline_run.py:136:    sts = boto3.client("sts", endpoint_url="https://sts.ap-southeast-2.amazonaws.com")
training_pipeline_run.py:132:    sts = boto3.client("sts", endpoint_url="https://sts.ap-southeast-2.amazonaws.com")

As a proper fix, I propose to add a new configurable parameter to define the VPC endpoint for STS.

To reproduce

Run {training,inference}_pipeline_run.py scripts from an EC2 instance running in private VPC with STS endpoint.

Expected behavior

Training or inference should complete.

System information

awscli version: aws-cli/1.18.179 Python/3.6.10 Linux/4.14.203-156.332.amzn2.x86_64 botocore/1.20.30
SageMaker Python SDK version: 1.72.1
Docker image: N/A
Python version: Python/3.6.10 Linux/4.14.203-156.332.amzn2.x86_64

github-actions[bot] commented 3 years ago

This issue is stale. If left untouched, it will be automatically closed in 7 days.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 60 days with no activity. Please update or respond to this comment if you're still interested in working on this.

awslabs / mlmax