Open humanzz opened 1 year ago
@akrishna1995 reaching out as I've noticed you're engaging on a similar implementation review in https://github.com/aws/sagemaker-python-sdk/pull/4145
I've reached out to documentation team internally and was advised to reach out here and was happy to see someone from the team engaging, and as such I'm reaching out, and asking for having a look into this request, or redirecting me to the relevant folks so I can coordinate with them.
I'm happy to propose a documentation write up for this feature.
I've seen https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md#documentation-guidelines and https://github.com/aws/sagemaker-python-sdk/blob/f2ae8ff8b6ed82eb89110887eb5e74c953e6372a/doc/frameworks/pytorch/using_pytorch.rst#using-third-party-libraries and I'm wondering if
What do you folks think? If (1) above makes sense, where do you think in the documentation hierarchy it should be?
Hello team,
I have recently worked with several SageMaker teams to deliver on a feature I requested a while back - to support installing
requirements.txt
dependencies from a specified CodeArtifact repository in sagemaker training jobs and deployed endpoints/models. You can see the Requests/PRs/Releases atThe GitHub feature request above was meant to cover all (or at least as many) SageMaker images (used in training jobs/inference), but I've prioritized working on delivering this capability in PyTorch 2.0.1 training/inference containers and that has already been delivered.
A while back, a blog post about leveraging CodeArtifact in SageMaker notebooks was published at https://aws.amazon.com/blogs/machine-learning/secure-aws-codeartifact-access-for-isolated-amazon-sagemaker-notebook-instances/ which, in addition to the feature requests above, provides good context on why users might want to install their
requirements.txt
dependencies from CodeArtifact.This is a request to update the PyTorch
requirements.txt
support documentation to add details about using CodeArtifact with PyTorch 2.0.1+ e.g. at https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#using-third-party-librariesBelow, I list out the instructions to leverage CodeArtifact
Steps to leverage CodeArtifact in PyTorch 2.0.1
Set the relevant CodeArtifact environment variable in Training jobs and in Models
CA_REPOSITORY_ARN
and the value is the CodeArtifact Repository ARNPyTorch
estimator'senvironment
argument to use in training jobPyTorch
estimator'sfit()
'senv
argument to use in creating the modelUpdate the IAM permissions of the SageMaker execution role, and CodeArtifact repository to allow the training job/model
The SageMaker execution role needs to have a policy that allows access and retrieval of packages from CodeArtifact repository. Examples for this are
AWSCodeArtifactReadOnlyAccess
from https://docs.aws.amazon.com/codeartifact/latest/ug/security_iam_id-based-policy-examples.htmlThe CodeArtifact repository's resource policy needs to allow the SageMaker execution role to execute the necessary actions
Happy to provide any additional context/details if needed.