Open humanzz opened 1 year ago
I've submitted 2 near-identical PRs to both sagemaker-training-toolkit
and sagemaker-inference-toolkit
at
My understanding is that if these get merged, then new containers leveraging those packages should start having CodeArtifact support
I've also submitted a pr for sagemaker-pytorch-inference-toolkit
at https://github.com/aws/sagemaker-pytorch-inference-toolkit/pull/150
Inference-side changes have been merged
Inference-side changes have been released at
At the moment of writing this comment, it seems that the PyTorch inference container use sagemaker-pytorch-inference-toolkit==2.0.14
as per https://github.com/search?q=repo%3Aaws%2Fdeep-learning-containers%20SM_TOOLKIT_VERSION&type=code
For PyTorch inference containers to pickup CodeArtifact support, they need to move to sagemaker-pytorch-inference-toolkit >= 2.0.16
All of the above PRs - coupled with the release of new container versions that have those updated package versions - provides CodeArtifact support by when the environment variable CA_REPOSITORY_ARN
is set to the arn of the desired CodeArtifact respository.
The other part to leveraging this feature requires updating the IAM policies
SageMaker Execution Role example policy
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"codeartifact:GetAuthorizationToken",
"codeartifact:GetRepositoryEndpoint",
"codeartifact:ReadFromRepository"
],
"Effect": "Allow",
"Resource": "*"
},
{
"Effect": "Allow",
"Action": "sts:GetServiceBearerToken",
"Resource": "*",
"Condition": {
"StringEquals": {
"sts:AWSServiceName": "codeartifact.amazonaws.com"
}
}
}
]
}
CodeArtifact respository example resource policy to permit the above role from account 123456789012
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"codeartifact:DescribePackageVersion",
"codeartifact:DescribeRepository",
"codeartifact:GetPackageVersionReadme",
"codeartifact:GetRepositoryEndpoint",
"codeartifact:ListPackages",
"codeartifact:ListPackageVersions",
"codeartifact:ListPackageVersionAssets",
"codeartifact:ListPackageVersionDependencies",
"codeartifact:ReadFromRepository"
],
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
},
"Resource": "*"
}
]
}
training side changes have been released at
This means that the remaining parts are more or less contained within this repo to update/release new container versions
sagemaker-training>=4.7.0
(most docker files would allow that)sagemaker-inference>=1.10.0
sagemaker-pytorch-inference>=2.0.16
Sagemaker PyTorch 2.0.1 Inference containers now support CodeArtifact
For training, this is likely to happen after
PyTorch 2.0.1 training images, with CodeArtifact support, have been released e.g. https://github.com/aws/deep-learning-containers/releases/tag/v1.3-pt-sagemaker-2.0.1-tr-cpu-py310
Summary of the work to get this into PT 2.0.1 training/inference images, and to hopefully enable this to flow to more more frameworks
flowchart TD
inferencepr["fa:fa-code-pull-request feat: support codeartifact for installing requirements.txt packages sagemaker-inference-toolkit#130
"] --> inferencerepo
inferencerepo["fa:fa-code sagemaker-inference-toolkit"] --> inferencerelease
inferencerelease["fa:fa-cube sagemaker-inference 1.10.0"]
inferencerelease -.-> ptinferencerelease
inferencerelease --> dlcptinferencerelease
ptinferencepr["fa:fa-code-pull-request reuse sagemaker-inference's requirements.txt installation logic sagemaker-pytorch-inference-toolkit#150"] --> ptinferencerepo
ptinferencerepo["fa:fa-code sagemaker-pytorch-inference-toolkit"] --> ptinferencerelease
ptinferencerelease["fa:fa-cube sagemaker-pytorch-inference 2.0.16"]
ptinferencerelease --> dlcptinferencerelease
dlcpr["fa:fa-code-pull-request [PyTorch] Update sagemaker-pytorch-inference to 2.0.16 deep-learning-containers#3227"] --> dlcrepo
dlcrepo["fa:fa-code deep-learning-containers"]
dlcptinferencerelease["fa:fa-cube v1.6-pt-sagemaker-2.0.1-inf-cpu-py310"]
dlcpttrainingrelease["fa:fa-cube v1.3-pt-sagemaker-2.0.1-tr-cpu-py310"]
dlcrepo --> dlcptinferencerelease
dlcrepo --> dlcpttrainingrelease
dlcrepo ---> otherreleases
trainingpr["fa:fa-code-pull-request feat: support codeartifact for installing requirements.txt packages sagemaker-training-toolkit#187
"] --> trainingrepo
trainingrepo["fa:fa-code sagemaker-training-toolkit"] --> trainingrelease
trainingrelease["fa:fa-cube sagemaker-training 4.7.0"]
trainingrelease --> dlcpttrainingrelease
otherreleases["fa:fa-cube future image releases using sagemaker-inference>=1.10.0 and sagemaker-training>=4.7.0"]
Checklist
Concise Description:
For training/inference containers supporting installing additional dependencies via
requirements.txt
, rather than using the public pypi index, allow passing necessary parameters to allow for installing the dependencies from a CodeArtifact repository instead.Is your feature request related to a problem? Please describe.
With security policies requiring running training jobs/endpoints in an internet-isolated VPC, leveraging
requirements.txt
to install additional dependencies on training/inference containers is not possible. Being able to leverage CodeArtifact - rather than pypi public index - would allow users ofrequirements.txt
to adhere to security best practices to isolate their training/inference runtimes from the internet.Describe the solution you'd like
requirements.txt
. Otherwise, it uses pypi index as usual.Describe alternatives you've considered
N/A
Additional context
requirements.txt
. Allowing passing CodeArtifact configurations, and for containers to leverage those configurations in order install the dependencies fromrequirements.txt
from a CodeArtifact repository would be an ideal solution.