aws-samples / amazon-sagemaker-studio-vpc-networkfirewall

This solution demonstrates the setup and deployment of Amazon SageMaker Studio into a private VPC and implementation of multi-layer security controls, such as data encryption, network traffic monitoring and restriction, usage of VPC endpoints, subnets and security groups, IAM resource policies.
https://github.com/aws-samples/amazon-sagemaker-studio-vpc-networkfirewall
MIT No Attribution
18 stars 4 forks source link

Amazon SageMaker Studio in a private VPC with NAT Gateway and Network Firewall

This solution demonstrates the setup and deployment of Amazon SageMaker Studio into a private VPC and implementation of multi-layer security controls, such as data encryption, network traffic monitoring and restriction, usage of VPC endpoints, subnets and security groups, and IAM resource policies. This source code repository is for the Securing Amazon SageMaker Studio internet traffic using AWS Network Firewall post on the AWS Machine Learning Blog.

The use case is a real-life environment security setup, which generally requires the following security-related features to be in place:

All these specific requirements are covered in the solution.

Jump to the deployment instructions

SageMaker security

You can apply all the same security and compliance approaches and best practices (authentication, authorization, VPC, network isolation, control and monitoring) as a consistent set of Amazon security features to Amazon SageMaker workloads and Amazon SageMaker Studio specifically.  

Network isolation

Common approaches for network isolation can also be applied to SageMaker workloads:

For example, you can enable network isolation controls when you create a SageMaker processing job: container network isolation

Access to resources in VPC

To avoid making your data and model containers accessible over the internet, we recommend that you create a private VPC and configure it to control access to your AWS resources. Using a VPC helps to protect your training containers and data because you can configure your VPC so that it is not connected to the internet and represents a completely isolated network environment. Using a VPC also allows you to monitor all network traffic in and out of your ML containers by using VPC flow logs.

You specify your private VPC configuration when you create a SageMaker workload (a notebook instance, processing or training job, model) by selecting a VPC and specifying subnets and security groups. When you specify the subnets and security groups, SageMaker creates elastic network interfaces (ENI) that are associated with your security groups in one of the subnets. Network interfaces allow your model containers to connect to resources in your VPC.

Please refer to more details on specific deployment use cases to [10].

After you configure the SageMaker workloads or SageMaker Studio to be hosted in your private VPC, you can apply all common VPC-based security controls (subnets, NACLs, security groups, VPC endpoints, NAT Gateway, and Network Firewall).

Deploy SageMaker Studio to VPC

You can choose to restrict which traffic can access the internet by launching Studio in a Virtual Private Cloud (VPC) of your choosing. This allows you fine-grained control of the network access and internet connectivity of your SageMaker Studio notebooks. You can disable direct internet access to add an additional layer of security. You can use AWS Network Firewall to implement further controls (stateless or stateful traffic filtering and applying your custom network firewall policies) on SageMaker workloads.

The following network settings are available when you create a new SageMaker Studio domain:

SageMaker Studio VPC settings

Amazon SageMaker Studio runs on an environment managed by AWS. When launching a new Studio domain, the parameter AppNetworkAccessType defines the external connectivity for such domain.

Direct internet access with AppNetworkAccessType=DirectInternetOnly:

SageMaker Studio default network config

No direct internet access with AppNetworkAccessType=VpcOnly: SageMaker Studio VpcOnly network config

❗ You won't be able to run a Studio notebook in VpcOnly mode unless your VPC has an interface endpoint to the SageMaker API and runtime, or a NAT gateway, and your security groups allow outbound connections.

Data encryption at rest and transit

Models and data are stored on the SageMaker Studio home directories in EFS or on SageMaker notebook EBS volumes. You can apply the standard practices and patterns to encrypt data using AWS KMS keys. This solution creates an AWS KMS CMK for EBS volume encryption that you can use to encrypt notebook instance data.

All communication with VPC endpoints to the public AWS services (SageMaker API, SageMaker Notebooks etc) are restricted to HTTPS protocol. You can control the set of network protocols for any communication between your protected workload subnet with help of VPC security groups or NACLs.

See the Amazon SageMaker developer guide for more information on protecting data in transit.

Amazon S3 access control

Developing ML models requires access to sensitive data stored on specific S3 buckets. You might want to implement controls to guarantee that:

We implement this requirement by using an S3 VPC Endpoint in your private VPC and configuring VPC Endpoint and S3 bucket policies.

First, start with the S3 bucket policy attached to the specific S3 bucket:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<s3-bucket-name>/*",
                "arn:aws:s3:::<s3-bucket-name>"
            ],
            "Condition": {
                "StringNotEquals": {
                    "aws:sourceVpce": "<s3-vpc-endpoint-id>"
                }
            }
        }
    ]
}

The bucket policy explicitly denies all access to the bucket which does not come from the designated VPC endpoint.

Second, attach the following permission policy to the S3 VPC Endpoint:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<s3-bucket-name>",
                "arn:aws:s3:::<s3-bucket-name>/*"
            ]
        }
    ]
}

This policy allows access the designated S3 buckets only.

This combination of S3 bucket policy and VPC endpoint policy, together with Amazon SageMaker Studio VPC connectivity, establishes that SageMaker Studio can only access the referenced S3 bucket, and this S3 bucket can only be accessed from the VPC endpoint.

❗ You will not be able to access these S3 buckets from the AWS console or aws cli.

All network traffic between Amazon SageMaker Studio and S3 is routed via the designated S3 VPC Endpoint over AWS private network and never traverses public internet.

You may consider to enable access to other S3 buckets via S3 VPC endpoint policy, for example to shared public SageMaker buckets, to enable additional functionality in the Amazon SageMaker Studio, like JumpStart. If you want to have access ot JumpStart, you must add the following statement to the S3 VPC endpoint policy:

    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "*",
      "Condition": {
        "StringEqualsIgnoreCase": {
          "s3:ExistingObjectTag/SageMaker": "true"
        }
      }
    }

Secure configuration of SageMaker notebook instances

Amazon SageMaker notebook instances can be launched with or without your Virtual Private Cloud (VPC) attached. When launched with your VPC attached, the notebook can either be configured with or without direct internet access:

create notebook instance network settings

You have three options for network configuration:

Direct internet access means that the Amazon SageMaker service is providing a network interface that allows for the notebook to talk to the internet through a VPC managed by the service.

For more information, see [11].

Without VPC

All the traffic goes through the Elastic Network Interface (ENI) attached to the managed EC2 instance, which is running in Amazon SageMaker managed VPC.

Notebook instance without VPC

All traffic goes via the ENI within an Amazon SageMaker managed VPC.

Private attached VPC with direct internet access

2 ENI attached to the managed EC2 instance:

Notebook instance with 2x ENI

Private attached VPC without direct internet access

1 ENI attached to the managed EC2 instance. For internet access the traffic should be routed via a NAT gateway or a virtual private gateway:

Notebook instance with 1x ENI

Please consult [11] for more information on ENI configuration and routing options.

Limit internet ingress and egress

When you configure the SageMaker Studio or SageMaker workload to use your private VPC without direct internet access option, the routing of internet inbound and outbound traffic is fully controlled by your VPC networking setup.

If you want to provide internet access through your VPC, just add an internet gateway or NAT gateway (if you want to block the inbound connections) and the proper routing entries. The internet traffic flows through your VPC, and you can implement other security controls such as inline inspections with a firewall or internet proxy.

You can use the AWS Network Firewall to implement URL, IP address, and domain-based stateful and stateless inbound and outbound traffic filtering.

This solution demonstrates the usage of the AWS Network Firewalls for domain names stateful filtering as a sample use case.

Enforce secure deployment of SageMaker resources

Three approaches for deploying Amazon SageMaker resources securely:

IAM condition keys approach

IAM condition keys can be used to improve security by preventing resources from being created without security controls.

"Condition": {
   "StringEquals": {
      "sagemaker:RootAccess": "Disabled"
   }
}

Amazon SageMaker service-specific condition keys

Security-specific examples of the condition keys:

Example: Enforce usage of network isolation mode

To enforce usage of resource secure configuration, you can add the following policy to the SageMaker execution role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:Create*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEqualsIfExists": {
                    "sagemaker:NetworkIsolation": "true"
                }
            }
        }
    ]
}

The policy denies creation of any component (processing or traning job, endpoint, transform job) if the sagemaker:NetworkIsolation parameter is not set to true. This applies only to the components which have this parameter. Similarly you can add validation of any other SageMaker service-specific condition keys.

AWS Service Catalog approach

Based on pre-defined CloudFormation templates to provision requested resources.

The following Amazon SageMaker resource types are supported by AWS CloudFormation. All other Amazon SageMaker resources need to be created using the custom resource approach.

CloudWatch Events approach

Amazon CloudWatch and CloudWatch Events can be used to implement responsive controls to improve security. You can monitor events from SageMaker service via CloudWatch Events rule and trigger a Lambda function to inspect if a SageMaker resource implements all of the necessary security controls

Demo setup overview

The solution implements the following setup to demonstrate the usage of SageMaker Studio deployment into a private VPC, usage of NAT Gateway and Network Firewall for internet traffic control.

Amazon SageMaker Studio infrastructure overview

❗ The solution uses only one availability zone (AZ) and is not highly-available. We do not recommend to use the single-AZ setup for any production deployment. The HA solution can be implemented by duplicating the single-AZ setup (subnets, NAT Gateway, Network Firewall endpoints) to additional AZs.
❗ The CloudFormation template will setup the Network Firewall routes automatically. However, the current implementation works only with single-AZ deployment:

VpcEndpointId: !Select ["1", !Split [":", !Select ["0", !GetAtt NetworkFirewall.EndpointIds]]]

For multi-AZ setup you need to implement a CloudFormation custom resource (e.g. Lambda function) to setup the Network Firewall endpoints in each subnet properly.

VPC resources

The solution deploys the following resources:

S3 resources

The solution deploys two Amazon S3 buckets:

Both buckets have a bucket policy attached. The bucket policy explicitly denies all access to the bucket which does not come from the designated VPC endpoint. The Amazon S3 VPC endpoint has also a policy attached to it. This policy allows access the the two S3 buckets (model and data) only.

As discussed above, this combination of S3 bucket policy and VPC endpoint policy ensures that SageMaker Studio can only access the referenced S3 buckets, and these S3 buckets can only be accessed from the VPC endpoint.

IAM resources

Two AWS KMS customer keys are deployed by the solution:

The solution also creates and deploys an IAM execution role for SageMaker notebooks and SageMaker Studio with pre-configured IAM policies.

SageMaker resources

The solution creates:

Deployment

Prerequisites

❗ For CloudFormation template deployment you must use the S3 bucket in the same region as your deployment region. If you need to deploy the solution in multiple regions, you need to create a bucket per region and specify the corresponding bucket name in the make deploy call as shown below.

❗ The solution will successfully deploy AWS Network Firewall only in the regions where the Network Firewall is available. In all other regions you will get a CloudFormation validation exception.

❗ If you have already one SageMaker domains in the current region deployed, the deployment will fail because there is limit of one SageMaker domain per region per AWS account.

CloudFormation stack parameters

There are no required parameters. All parameters have default values. You may want to change the DomainName or *CIDR parameters to avoid naming conflict with the existing resources and CIDR allocations.

You can change the stack parameters in the Makefile or pass them as variable assignments as part of make call.

❗ Please make sure that default or your custom CIDRs do not conflict with any existing VPC in the account and the region where you are deploying the CloudFormation stack.

Deployment steps

To deploy the stack into the current account and region please complete the following steps.

Clone the GitHub repository

git clone https://github.com/aws-samples/amazon-sagemaker-studio-vpc-networkfirewall.git
cd amazon-sagemaker-studio-vpc-networkfirewall

Create a S3 bucket

You need a S3 bucket for CloudFormation deployment. If you don't have a S3 bucket in the current region, you can create one via the aws cli:

aws s3 mb s3://<your s3 bucket name>

Deploy CloudFormation stack

make deploy CFN_ARTEFACT_S3_BUCKET=<your s3 bucket name>

You can specify non-default values for stack parameters in the make deploy command. See Makefile for parameter names.

The bucket must be in the same region where you are deploying. You specify just the bucket name, not a S3 URL or a bucket ARN.

The stack will deploy all needed resources like VPC, network devices, route tables, security groups, S3 buckets, IAM policies and roles, VPC endpoints and also create a new SageMaker studio domain, and a new user profile.

The deployment takes about 25 minutes to complete. After deployment completes, you can see the full list of stack output values by running the following command in a terminal:

aws cloudformation describe-stacks \
    --stack-name sagemaker-studio-demo \
    --output table \
    --query "Stacks[0].Outputs[*].[OutputKey, OutputValue]"

You can now launch the Amazon SageMaker Studio from the SageMaker console or generate a pre-signed URL with the following CLI commands:

export DOMAIN_ID=$(aws sagemaker list-domains --output text --query 'Domains[0].DomainId')
export USER_PROFILE_NAME=$(aws sagemaker list-user-profiles --domain-id=$DOMAIN_ID --output text --query 'UserProfiles[0].UserProfileName')

aws sagemaker create-presigned-domain-url \
    --domain-id $DOMAIN_ID \
    --user-profile-name $USER_PROFILE_NAME \
    --output text \
    --query 'AuthorizedUrl'

Paste the output of the create-presigned-domain-url command into a browser, you will be redirected to the Studio.

Demo

Start the Amazon SageMaker Studio from the pre-signed URL or via the AWS SageMaker console.

Infrastructure walk-through

Take a look to the following components and services created by the deployment:

S3 access

SageMaker Studio has access to the designated S3 buckets (-models and -data) and to these S3 buckets only. The access to S3 buckets is controlled by a combination of the S3 VPC endpoint policy and the S3 bucket policy.

❗ Note, you are not able to use SageMaker JumpStart or any other SageMaker Studio functionality which requires access to other Amazon S3 buckets. To enable access to other S3 buckets you have to change the S3 VPC endpoint policy.

Now we are going to change the S3 VPC endpoint policy and to allow access to additional S3 resources.

The access is denied because the S3 VPC endpoint policy doesn't allow access to any S3 buckets except for models and data as configured in the endpoint policy: S3 access denied

Now add the following statement to the S3 VPC endpoint policy:

    {
      "Effect": "Allow",
      "Principal": "*",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "*",
      "Condition": {
        "StringEqualsIgnoreCase": {
          "s3:ExistingObjectTag/SageMaker": "true"
        }
      }
    }

Command line:

cat <<EoF >s3-vpce-policy.json
{
    "Effect": "Allow",
    "Principal": "*",
    "Action": [
    "s3:GetObject"
    ],
    "Resource": "*",
    "Condition": {
    "StringEqualsIgnoreCase": {
        "s3:ExistingObjectTag/SageMaker": "true"
    }
    }
}
EoF
VPCE_ID=# VPC Id from the stack output

aws ec2 modify-vpc-endpoint \
    --vpc-endpoint-id $VPCE_ID \
    --policy-document file://s3-vpce-policy.json

We have seen now, that you can control access to S3 buckets via combination of S3 bucket policy and S3 Endpoint policy.

Controlling internet access

The AWS Machine Learning blog post Securing Amazon SageMaker Studio internet traffic using AWS Network Firewall shows how the internet ingress and egress for SageMaker Studio can be controlled with AWS Network Firewall.

Clean up

This operation will delete the whole stack together with SageMaker Studio Domain and user profile.

The stack deletion operations will fail, if there are any objects in -models and -data S3 buckets. Before starting stack deletion, you must delete all objects in these S3 buckets.

As there is no access to these S3 buckets outside of the SageMaker VPC, you must run the following commands in Studio terminal:

aws s3 rm s3://<project_name>-<account_id>-<region>-data --recursive --quiet
aws s3 rm s3://<project_name>-<account_id>-<region>-models --recursive --quiet

Follow these steps to delete the solution stack:

  1. Exit all instances SageMaker Studio.
  2. Check if KernelGateway is running (in the SageMaker Studio control panel in AWS Console). If yes, delete KernelGateway and wait until the deletion process finishes
  3. If you enabled logging configuration for Network Firewall, remove it from the Firewall Details (AWS Console)
  4. If you changed the stateful rule group in the firewall policy, delete all added domain names leaving only the original domains: .kaggle.com, .amazonaws.com
  5. Delete the stack via the AWS CloudFormation console or via the command line:
    make delete

Delete left-over resources

The deployment of Amazon SageMaker Studio creates a new EFS file system in your account. When you delete the data science environment stack, the SageMaker Studio domain, user profile and Apps are also deleted. However, the EFS file system will not be deleted and kept "as is" in your account (EFS file system contains home directories for SageMaker Studio users and may contain your data). Additional resources are created by SageMaker Studio and retained upon deletion together with the EFS file system:

❗ To delete the EFS file system and EFS-related resources in your AWS account created by the deployment of this solution, do the following steps after after deletion of CloudFormation stack.

This is a destructive action. All data on the EFS file system will be deleted (SageMaker home directories). You may want to backup the EFS file system before deletion

From AWS console:

Stack deletion troubleshooting

Sometimes stack might fail to delete. If stack deletion fails, check the events in the AWS CloudFormation console.

If the deletion of the SageMaker domain fails, check if there are any running applications (e.g. KernelGateway) for the user profile as described in Delete Amazon SageMaker Studio Domain. Try to delete the applications and re-run the make delete command or delete the stack from AWS CloudFormation console.

If the deletion of the Network Firewall fails, check is you removed the logging configuration and the stateful rule group is in the original state.

Resources

[1]. SageMaker Security in the Developer Guide
[2]. SageMaker Infrastructure Security
[3]. Initial version of the CloudFormation templates for deployment of VPC, Subnets, and S3 buckets is taken from this GitHub repository
[4]. Blog post for the repository: Securing Amazon SageMaker Studio connectivity using a private VPC
[5]. Secure deployment of Amazon SageMaker resources
[6]. Security-focused workshop Amazon SageMaker Workshop: Building Secure Environments
[7]. Amazon SageMaker Identity-Based Policy Examples
[8]. Deployment models for AWS Network Firewall
[9]. VPC Ingress Routing – Simplifying Integration of Third-Party Appliances
[10]. Host SageMaker workloads in a private VPC
[11]. Understanding Amazon SageMaker notebook instance networking configurations and advanced routing options
[12]. Create Amazon SageMaker Studio using AWS CloudFormation
[13]. Building secure machine learning environments with Amazon SageMaker
[14]. Secure Data Science Reference Architecture GitHub

Appendix

aws cli commands to setup and launch Amazon SageMaker Studio

The following commands show how you can create Studio Domain and user profile from a command line. This is for reference only, as the stack creates the domain and user profile automatically.

Create an Amazon SageMaker Studio domain inside a VPC

Please replace the variables with corresponding values from sagemaker-studio-vpc CloudFormation stack output:

REGION=$AWS_DEFAULT_REGION
SM_DOMAIN_NAME=# SageMaker domain name
VPC_ID=
SAGEMAKER_STUDIO_SUBNET_IDS=
SAGEMAKER_SECURITY_GROUP=
EXECUTION_ROLE_ARN=

aws sagemaker create-domain \
    --region $REGION \
    --domain-name $SM_DOMAIN_NAME \
    --vpc-id $VPC_ID \
    --subnet-ids $SAGEMAKER_STUDIO_SUBNET_IDS \
    --app-network-access-type VpcOnly \
    --auth-mode IAM \
    --default-user-settings "ExecutionRole=${EXECUTION_ROLE_ARN},SecurityGroups=${SAGEMAKER_SECURITY_GROUP}"

Note the domain id from the DomainArm returned by the create-domain call:

"DomainArn": "arn:aws:sagemaker:eu-west-1:ACCOUNT_ID:domain/d-ktlfey9wdfub"

Create a user profile

DOMAIN_ID=
USER_PROFILE_NAME=

aws sagemaker create-user-profile \
    --region $REGION \
    --domain-id $DOMAIN_ID \
    --user-profile-name $USER_PROFILE_NAME

Create pre-signed URL to access Amazon SageMaker Studio

aws sagemaker create-presigned-domain-url \
    --region $REGION \
    --domain-id $DOMAIN_ID \
    --user-profile-name $USER_PROFILE_NAME

Use the generated pre-signed URL to connect to Amazon SageMaker Studio

License

This project is licensed under the MIT-0 License. See the LICENSE file.