aws-samples / sagemaker-studio-sparkmagic-lib

MIT No Attribution
7 stars 8 forks source link

Provide instructions in README to override EMR endpoint in VPC endpoints #5

Closed jaipreet-s closed 3 years ago

jaipreet-s commented 3 years ago

There is a bug in botocore boto/botocore#2376 where the default EMR endpoints resolves to us-west-2.elasticmapreduce.amazonaws.com. The DNS names for these VPC endpoints do not match the URL configured and so you have to force via the endpoint_url option when you create the client.

This commit provides a temporary workaround for customers to patch the botocore endpoint configuration for usage with EMR VPC endpoints. As a proper fix, we should update the EMR clients in this library to specify endpoint_url while creating the EMR client

Testing Done

Tested in SageMaker Studio connected to a private subnet with EMR VPC endpoint. Run this snippet in the PySpark kernel, and then validated that I was able to call emr.list_clusters()