aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

Document the support for installing requirements.txt dependencies from CodeArtifact in PyTorch 2.0.1+ SageMaker Containers #4189

Open humanzz opened 11 months ago

humanzz commented 11 months ago

Hello team,

I have recently worked with several SageMaker teams to deliver on a feature I requested a while back - to support installing requirements.txt dependencies from a specified CodeArtifact repository in sagemaker training jobs and deployed endpoints/models. You can see the Requests/PRs/Releases at

The GitHub feature request above was meant to cover all (or at least as many) SageMaker images (used in training jobs/inference), but I've prioritized working on delivering this capability in PyTorch 2.0.1 training/inference containers and that has already been delivered.

A while back, a blog post about leveraging CodeArtifact in SageMaker notebooks was published at https://aws.amazon.com/blogs/machine-learning/secure-aws-codeartifact-access-for-isolated-amazon-sagemaker-notebook-instances/ which, in addition to the feature requests above, provides good context on why users might want to install their requirements.txt dependencies from CodeArtifact.

This is a request to update the PyTorch requirements.txt support documentation to add details about using CodeArtifact with PyTorch 2.0.1+ e.g. at https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#using-third-party-libraries

Below, I list out the instructions to leverage CodeArtifact

Steps to leverage CodeArtifact in PyTorch 2.0.1

  1. Set the relevant CodeArtifact environment variable in Training jobs and in Models
  2. Update the IAM permissions of the SageMaker execution role, and CodeArtifact repository to allow the training job/model

Set the relevant CodeArtifact environment variable in Training jobs and in Models

  1. The environment variable to set is CA_REPOSITORY_ARN and the value is the CodeArtifact Repository ARN
  2. Where the environment variable needs to be set is

Update the IAM permissions of the SageMaker execution role, and CodeArtifact repository to allow the training job/model

  1. The SageMaker execution role needs to have a policy that allows access and retrieval of packages from CodeArtifact repository. Examples for this are

  2. The CodeArtifact repository's resource policy needs to allow the SageMaker execution role to execute the necessary actions

Happy to provide any additional context/details if needed.

humanzz commented 11 months ago

@akrishna1995 reaching out as I've noticed you're engaging on a similar implementation review in https://github.com/aws/sagemaker-python-sdk/pull/4145

I've reached out to documentation team internally and was advised to reach out here and was happy to see someone from the team engaging, and as such I'm reaching out, and asking for having a look into this request, or redirecting me to the relevant folks so I can coordinate with them.

humanzz commented 11 months ago

I'm happy to propose a documentation write up for this feature.

I've seen https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md#documentation-guidelines and https://github.com/aws/sagemaker-python-sdk/blob/f2ae8ff8b6ed82eb89110887eb5e74c953e6372a/doc/frameworks/pytorch/using_pytorch.rst#using-third-party-libraries and I'm wondering if

  1. Maybe a new documentation page should be created about CodeArtifact support
  2. Add a link to it from the https://github.com/aws/sagemaker-python-sdk/blob/f2ae8ff8b6ed82eb89110887eb5e74c953e6372a/doc/frameworks/pytorch/using_pytorch.rst#using-third-party-libraries and potentially later from any other framework pages that start including that support

What do you folks think? If (1) above makes sense, where do you think in the documentation hierarchy it should be?