[feature-request] Support installing dependencies in requirements.txt from CodeArtifact for both training/inference SageMaker Containers

humanzz commented 1 year ago

Checklist

[ ] I've prepended issue tag with type of change: [feature]
[ ] (If applicable) I've documented below the DLC image/dockerfile this relates to
[ ] (If applicable) I've documented the tests I've run on the DLC image
[ ] I'm using an existing DLC image listed here: https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html
[ ] I've built my own container based off DLC (and I've attached the code used to build my own image)

Concise Description:

For training/inference containers supporting installing additional dependencies via requirements.txt, rather than using the public pypi index, allow passing necessary parameters to allow for installing the dependencies from a CodeArtifact repository instead.

Is your feature request related to a problem? Please describe.

With security policies requiring running training jobs/endpoints in an internet-isolated VPC, leveraging requirements.txt to install additional dependencies on training/inference containers is not possible. Being able to leverage CodeArtifact - rather than pypi public index - would allow users of requirements.txt to adhere to security best practices to isolate their training/inference runtimes from the internet.

Describe the solution you'd like

Leverage the ability to set environment variables when creating a training job or a model to pass environment variables indicating which CodeArtifact repository to use (domain, domain owner, repository)
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html
https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModel.html
https://docs.aws.amazon.com/codeartifact/latest/ug/python-configure-pip.html
If the environment variables are set, the container configures pip to use codeartifact prior to installing the dependencies in requirements.txt. Otherwise, it uses pypi index as usual.

Describe alternatives you've considered

N/A

Additional context

Here's a blog post https://aws.amazon.com/blogs/machine-learning/secure-aws-codeartifact-access-for-isolated-amazon-sagemaker-notebook-instances/ describing some of the benefits of using CodeArtifact - but in SageMaker Notebooks that are Internet-isolated.
Similarly, running training jobs/deploying endpoints using an isolated VPC disallows the usage of requirements.txt. Allowing passing CodeArtifact configurations, and for containers to leverage those configurations in order install the dependencies from requirements.txt from a CodeArtifact repository would be an ideal solution.
There are 2 related feature request made to sagemaker-training-toolkit/sagemaker-inference-toolkit at https://github.com/aws/sagemaker-training-toolkit/issues/167 and https://github.com/aws/sagemaker-inference-toolkit/issues/85

humanzz commented 1 year ago

I've submitted 2 near-identical PRs to both sagemaker-training-toolkit and sagemaker-inference-toolkit at

My understanding is that if these get merged, then new containers leveraging those packages should start having CodeArtifact support

humanzz commented 1 year ago

I've also submitted a pr for sagemaker-pytorch-inference-toolkit at https://github.com/aws/sagemaker-pytorch-inference-toolkit/pull/150

humanzz commented 1 year ago

Inference-side changes have been merged

humanzz commented 1 year ago

Inference-side changes have been released at

At the moment of writing this comment, it seems that the PyTorch inference container use sagemaker-pytorch-inference-toolkit==2.0.14 as per https://github.com/search?q=repo%3Aaws%2Fdeep-learning-containers%20SM_TOOLKIT_VERSION&type=code

For PyTorch inference containers to pickup CodeArtifact support, they need to move to sagemaker-pytorch-inference-toolkit >= 2.0.16

humanzz commented 1 year ago

All of the above PRs - coupled with the release of new container versions that have those updated package versions - provides CodeArtifact support by when the environment variable CA_REPOSITORY_ARN is set to the arn of the desired CodeArtifact respository.

The other part to leveraging this feature requires updating the IAM policies

SageMaker Execution Role would need to be updated to permit access to CodeArtifact
CodeArtifact repository resource policy might also require updates

SageMaker Execution Role example policy

{
    "Version": "2012-10-17",
    "Statement": [
       {
          "Action": [
                "codeartifact:GetAuthorizationToken",
                "codeartifact:GetRepositoryEndpoint",
                "codeartifact:ReadFromRepository"
          ],
          "Effect": "Allow",
          "Resource": "*"
       },
       {
          "Effect": "Allow",
          "Action": "sts:GetServiceBearerToken",
          "Resource": "*",
             "Condition": {
                "StringEquals": {
                   "sts:AWSServiceName": "codeartifact.amazonaws.com"
                }
             }
       }
     ]
 }

CodeArtifact respository example resource policy to permit the above role from account 123456789012

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "codeartifact:DescribePackageVersion",
                "codeartifact:DescribeRepository",
                "codeartifact:GetPackageVersionReadme",
                "codeartifact:GetRepositoryEndpoint",
                "codeartifact:ListPackages",
                "codeartifact:ListPackageVersions",
                "codeartifact:ListPackageVersionAssets",
                "codeartifact:ListPackageVersionDependencies",
                "codeartifact:ReadFromRepository"
            ],
            "Effect": "Allow",
            "Principal": {
                 "AWS": "arn:aws:iam::123456789012:root"
            },
            "Resource": "*"
        }
    ]
}

humanzz commented 1 year ago

training side changes have been released at

humanzz commented 1 year ago

This means that the remaining parts are more or less contained within this repo to update/release new container versions

release of new training container versions to leverage sagemaker-training>=4.7.0 (most docker files would allow that)
release of new inference containers to leverage sagemaker-inference>=1.10.0
merge of https://github.com/aws/deep-learning-containers/pull/3227 so they leverage sagemaker-pytorch-inference>=2.0.16

humanzz commented 1 year ago

Sagemaker PyTorch 2.0.1 Inference containers now support CodeArtifact

https://github.com/aws/deep-learning-containers/pull/3227 has been merged
A new version of the 2.0.1 containers have been released e.g. https://github.com/aws/deep-learning-containers/releases/tag/v1.6-pt-sagemaker-2.0.1-inf-cpu-py310

For training, this is likely to happen after

https://github.com/aws/deep-learning-containers/pull/3172 is merged
New container versions are released (thereby taking advantage of https://github.com/aws/sagemaker-training-toolkit/pull/187)

humanzz commented 1 year ago

PyTorch 2.0.1 training images, with CodeArtifact support, have been released e.g. https://github.com/aws/deep-learning-containers/releases/tag/v1.3-pt-sagemaker-2.0.1-tr-cpu-py310

humanzz commented 1 year ago

Summary of the work to get this into PT 2.0.1 training/inference images, and to hopefully enable this to flow to more more frameworks

flowchart TD
    inferencepr["fa:fa-code-pull-request feat: support codeartifact for installing requirements.txt packages sagemaker-inference-toolkit#130
"] --> inferencerepo
    inferencerepo["fa:fa-code sagemaker-inference-toolkit"] --> inferencerelease
    inferencerelease["fa:fa-cube sagemaker-inference 1.10.0"]
    inferencerelease -.-> ptinferencerelease
    inferencerelease --> dlcptinferencerelease

    ptinferencepr["fa:fa-code-pull-request reuse sagemaker-inference's requirements.txt installation logic sagemaker-pytorch-inference-toolkit#150"] --> ptinferencerepo
    ptinferencerepo["fa:fa-code sagemaker-pytorch-inference-toolkit"] --> ptinferencerelease
    ptinferencerelease["fa:fa-cube sagemaker-pytorch-inference 2.0.16"]
    ptinferencerelease --> dlcptinferencerelease

    dlcpr["fa:fa-code-pull-request [PyTorch] Update sagemaker-pytorch-inference to 2.0.16 deep-learning-containers#3227"] --> dlcrepo
    dlcrepo["fa:fa-code deep-learning-containers"]
    dlcptinferencerelease["fa:fa-cube v1.6-pt-sagemaker-2.0.1-inf-cpu-py310"]
    dlcpttrainingrelease["fa:fa-cube v1.3-pt-sagemaker-2.0.1-tr-cpu-py310"]
    dlcrepo --> dlcptinferencerelease
    dlcrepo --> dlcpttrainingrelease
    dlcrepo ---> otherreleases

    trainingpr["fa:fa-code-pull-request feat: support codeartifact for installing requirements.txt packages sagemaker-training-toolkit#187
"] --> trainingrepo
    trainingrepo["fa:fa-code sagemaker-training-toolkit"] --> trainingrelease
    trainingrelease["fa:fa-cube sagemaker-training 4.7.0"]
    trainingrelease --> dlcpttrainingrelease

    otherreleases["fa:fa-cube future image releases using sagemaker-inference>=1.10.0 and sagemaker-training>=4.7.0"]

aws / deep-learning-containers

[feature-request] Support installing dependencies in requirements.txt from CodeArtifact for both training/inference SageMaker Containers #2509