awslabs / mlmax

Example templates for the delivery of custom ML solutions to production so you can get started quickly without having to make too many design choices.
https://mlmax.readthedocs.io/en/latest/
Apache License 2.0
66 stars 19 forks source link

training and inference run scripts support STS endpoints. #57

Closed verdimrc closed 3 years ago

verdimrc commented 3 years ago

🐛 Bug report

Describe the bug

Without public internet, but with STS endpoint setup, both inference_pipeline_run.py and training_pipeline_run.py will fail due to http timeout.

This can be fix by hardcoding the endpoint as follows:

inference_pipeline_run.py:136:    sts = boto3.client("sts", endpoint_url="https://sts.ap-southeast-2.amazonaws.com")
training_pipeline_run.py:132:    sts = boto3.client("sts", endpoint_url="https://sts.ap-southeast-2.amazonaws.com")

As a proper fix, I propose to add a new configurable parameter to define the VPC endpoint for STS.

To reproduce

Run {training,inference}_pipeline_run.py scripts from an EC2 instance running in private VPC with STS endpoint.

Expected behavior

Training or inference should complete.

System information

github-actions[bot] commented 3 years ago

This issue is stale. If left untouched, it will be automatically closed in 7 days.

github-actions[bot] commented 3 years ago

This issue is stale because it has been open for 60 days with no activity. Please update or respond to this comment if you're still interested in working on this.