aws / amazon-sagemaker-feedback

Amazon SageMaker Public Feedback Dashboard
Creative Commons Attribution Share Alike 4.0 International
6 stars 1 forks source link

Troubleshooting permissions while connecting SageMaker to EMR Cluster #129

Closed aiqc closed 1 month ago

aiqc commented 1 month ago

Question

How can I authorize my SageMaker Studio notebook to connect to my EMR Cluster?

Other Details

https://stackoverflow.com/questions/78962340/sagemaker-emr-cluster-select-emr-runtime-role-for-cluster

aiqc commented 1 month ago

Within domain configuration, I found where the SageMaker user's EMR Assumable Role and EMR Execution Role attributes can be defined.

However, it is not clear what ARN values I should be using. Nor am I able to get the spark context working in either kernel (Glue PySpark, SparkMagic PySpark)

Screenshot from 2024-09-08 15-40-15

Screenshot from 2024-09-08 15-37-32

aiqc commented 1 month ago

I added glue to the list of services in the custom allowable policy example of the documentation and now the SparkMagic PySpark connection in the notebook works as expected. https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/getting-started.html#gs-runtime-role

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "EMRServerlessTrustPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "emr-serverless.amazonaws.com", #<-- the only entry in documentation
                    "glue.amazonaws.com"                   #<-- I added this entry
                ]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Screenshot from 2024-09-08 16-44-29

Maybe that fixed it? I don't know. This was supposed to be a fun thing to explore on Friday morning, but now it's Sunday night.

aiqc commented 1 month ago

Closing this because I don't need help, but it is a pain point for sure