aws-samples / sagemaker-run-notebook

Tools to run Jupyter notebooks as jobs in Amazon SageMaker - ad hoc, on a schedule, or in response to events
Apache License 2.0
141 stars 68 forks source link

Supporting Scheduled Sagemaker Notebooks in VPC Mode #34

Open TVAO opened 2 years ago

TVAO commented 2 years ago

The current cloud formation template does not support running Sagemaker Processing jobs within a VPC.

To solve this, the scheduling lambda function should be passed VpcConfig from the Sagemaker Studio domain if present.

Possibly, it should be passed to the InvokeNotebookLambda before being passed to the boto3 SDK call to Sagemaker create_processing_job, which ensures the Sagemaker processing instance has access to VPC resources.

Also, I believe the policy might need certain permissions to ec2 although the ones I have listed might be too many.

`Policies:

The VpcConfig should be passed into the boto3 create_processing_job request in the invoked lambda function:

"NetworkConfig": { "EnableInterContainerTrafficEncryption": boolean, "EnableNetworkIsolation": boolean, "VpcConfig": { "SecurityGroupIds": [ "string" ], "Subnets": [ "string" ] } }

In the code of the lambda template:

B['NetworkConfig'] = {'EnableInterContainerTrafficEncryption': False,'EnableNetworkIsolation': False,'VpcConfig': {'SecurityGroupIds': ['sg-<xxxx>'],'Subnets': ['subnet-<xxxx>','subnet-<xxxx>']}} V=boto3.client('sagemaker');H=V.create_processing_job(**B);W=H['ProcessingJobArn'];X=re.sub('^.*/','',W);return X

As of now, Scheduled Sagemaker Notebooks running Sagemaker Processing Jobs on Sagemaker Processing Instances do not have access to private resources within the VPC attached to Sagemaker Studio upon its domain creation.

lasdem commented 2 years ago

Thank you! I managed to get it working with running the job via command line by passing the Networking Config as --extra --extra '{ "NetworkConfig": { "EnableInterContainerTrafficEncryption": false, "EnableNetworkIsolation": false, "VpcConfig": { "SecurityGroupIds": [ "sg-xxx" ], "Subnets": [ "subnet-xxx" ] } } }'

I would really like if the UI would pass the vpc config from the sagemaker domain automatically.