aws / deep-learning-containers

AWS Deep Learning Containers are pre-built Docker images that make it easier to run popular deep learning frameworks and tools on AWS.
https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/what-is-dlc.html
Other
1.01k stars 463 forks source link

[feature-request] Support installing dependencies in requirements.txt from CodeArtifact for both training/inference SageMaker Containers #2509

Open humanzz opened 1 year ago

humanzz commented 1 year ago

Checklist

Concise Description:

For training/inference containers supporting installing additional dependencies via requirements.txt, rather than using the public pypi index, allow passing necessary parameters to allow for installing the dependencies from a CodeArtifact repository instead.

Is your feature request related to a problem? Please describe.

With security policies requiring running training jobs/endpoints in an internet-isolated VPC, leveraging requirements.txt to install additional dependencies on training/inference containers is not possible. Being able to leverage CodeArtifact - rather than pypi public index - would allow users of requirements.txt to adhere to security best practices to isolate their training/inference runtimes from the internet.

Describe the solution you'd like

Describe alternatives you've considered

N/A

Additional context

humanzz commented 1 year ago

I've submitted 2 near-identical PRs to both sagemaker-training-toolkit and sagemaker-inference-toolkit at

My understanding is that if these get merged, then new containers leveraging those packages should start having CodeArtifact support

humanzz commented 1 year ago

I've also submitted a pr for sagemaker-pytorch-inference-toolkit at https://github.com/aws/sagemaker-pytorch-inference-toolkit/pull/150

humanzz commented 1 year ago

Inference-side changes have been merged

humanzz commented 1 year ago

Inference-side changes have been released at

At the moment of writing this comment, it seems that the PyTorch inference container use sagemaker-pytorch-inference-toolkit==2.0.14 as per https://github.com/search?q=repo%3Aaws%2Fdeep-learning-containers%20SM_TOOLKIT_VERSION&type=code

For PyTorch inference containers to pickup CodeArtifact support, they need to move to sagemaker-pytorch-inference-toolkit >= 2.0.16

humanzz commented 1 year ago

All of the above PRs - coupled with the release of new container versions that have those updated package versions - provides CodeArtifact support by when the environment variable CA_REPOSITORY_ARN is set to the arn of the desired CodeArtifact respository.

The other part to leveraging this feature requires updating the IAM policies

  1. SageMaker Execution Role would need to be updated to permit access to CodeArtifact
  2. CodeArtifact repository resource policy might also require updates

SageMaker Execution Role example policy

{
    "Version": "2012-10-17",
    "Statement": [
       {
          "Action": [
                "codeartifact:GetAuthorizationToken",
                "codeartifact:GetRepositoryEndpoint",
                "codeartifact:ReadFromRepository"
          ],
          "Effect": "Allow",
          "Resource": "*"
       },
       {
          "Effect": "Allow",
          "Action": "sts:GetServiceBearerToken",
          "Resource": "*",
             "Condition": {
                "StringEquals": {
                   "sts:AWSServiceName": "codeartifact.amazonaws.com"
                }
             }
       }
     ]
 }

CodeArtifact respository example resource policy to permit the above role from account 123456789012

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "codeartifact:DescribePackageVersion",
                "codeartifact:DescribeRepository",
                "codeartifact:GetPackageVersionReadme",
                "codeartifact:GetRepositoryEndpoint",
                "codeartifact:ListPackages",
                "codeartifact:ListPackageVersions",
                "codeartifact:ListPackageVersionAssets",
                "codeartifact:ListPackageVersionDependencies",
                "codeartifact:ReadFromRepository"
            ],
            "Effect": "Allow",
            "Principal": {
                 "AWS": "arn:aws:iam::123456789012:root"
            },
            "Resource": "*"
        }
    ]
}
humanzz commented 1 year ago

training side changes have been released at

humanzz commented 1 year ago

This means that the remaining parts are more or less contained within this repo to update/release new container versions

humanzz commented 1 year ago

Sagemaker PyTorch 2.0.1 Inference containers now support CodeArtifact

For training, this is likely to happen after

humanzz commented 1 year ago

PyTorch 2.0.1 training images, with CodeArtifact support, have been released e.g. https://github.com/aws/deep-learning-containers/releases/tag/v1.3-pt-sagemaker-2.0.1-tr-cpu-py310

humanzz commented 1 year ago

Summary of the work to get this into PT 2.0.1 training/inference images, and to hopefully enable this to flow to more more frameworks

flowchart TD
    inferencepr["fa:fa-code-pull-request feat: support codeartifact for installing requirements.txt packages sagemaker-inference-toolkit#130
"] --> inferencerepo
    inferencerepo["fa:fa-code sagemaker-inference-toolkit"] --> inferencerelease
    inferencerelease["fa:fa-cube sagemaker-inference 1.10.0"]
    inferencerelease -.-> ptinferencerelease
    inferencerelease --> dlcptinferencerelease

    ptinferencepr["fa:fa-code-pull-request reuse sagemaker-inference's requirements.txt installation logic sagemaker-pytorch-inference-toolkit#150"] --> ptinferencerepo
    ptinferencerepo["fa:fa-code sagemaker-pytorch-inference-toolkit"] --> ptinferencerelease
    ptinferencerelease["fa:fa-cube sagemaker-pytorch-inference 2.0.16"]
    ptinferencerelease --> dlcptinferencerelease

    dlcpr["fa:fa-code-pull-request [PyTorch] Update sagemaker-pytorch-inference to 2.0.16 deep-learning-containers#3227"] --> dlcrepo
    dlcrepo["fa:fa-code deep-learning-containers"]
    dlcptinferencerelease["fa:fa-cube v1.6-pt-sagemaker-2.0.1-inf-cpu-py310"]
    dlcpttrainingrelease["fa:fa-cube v1.3-pt-sagemaker-2.0.1-tr-cpu-py310"]
    dlcrepo --> dlcptinferencerelease
    dlcrepo --> dlcpttrainingrelease
    dlcrepo ---> otherreleases

    trainingpr["fa:fa-code-pull-request feat: support codeartifact for installing requirements.txt packages sagemaker-training-toolkit#187
"] --> trainingrepo
    trainingrepo["fa:fa-code sagemaker-training-toolkit"] --> trainingrelease
    trainingrelease["fa:fa-cube sagemaker-training 4.7.0"]
    trainingrelease --> dlcpttrainingrelease

    otherreleases["fa:fa-cube future image releases using sagemaker-inference>=1.10.0 and sagemaker-training>=4.7.0"]