aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.1k stars 1.14k forks source link

Allowing to instantiate the Framework estimator class to support custom Framework containers #1467

Open giuseppeporcelli opened 4 years ago

giuseppeporcelli commented 4 years ago

Is your feature request related to a problem? Please describe. No

Describe the solution you'd like If you build a custom container for SageMaker, you can use the sagemaker-training-toolkit library to provide script mode execution and be able to load user training module from an Amazon S3 archive, following the same approach of the open source deep learning containers implemented by AWS.

Then, running training with this container with the SM Python SDK would benefit from the ability to instantiate the Framework estimator class, in order to leverage on the SDK functionalities which build the sourcedir.tar.gz and upload it to Amazon S3 before starting the training job.

Describe alternatives you've considered The alternative solution is extending the Framework class for the specific use case.

Additional context Examples on how to build custom training containers for Amazon SageMaker using the training toolkit. The last example shows how to extend the Framework estimator class. https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/sagemaker-custom-training-containers

nadiaya commented 4 years ago

The idea behind the current implementation was that the Framework class should be extended similar to how we did it for all the supported frameworks: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/pytorch/estimator.py#L33

Would that work for your use case?

giuseppeporcelli commented 4 years ago

Well that's one of the alternatives I had considered (see alternatives above) and, actually, the way I have implemented it so far.

However, this poses the problem to find an elegant way to make it available to data scientists that are going to use that implementation: what I mean is that I don't expect each project using such estimator to declare it in a python module, so you should create a python package and install it to, let's say SM notebook instance environments using, for example, lifecycle configurations.

If the PyDSK would allow instantiating a generic framework estimator, custom coding, packaging and install would not be needed.

nadiaya commented 4 years ago

That makes sense. Thank you for the clarification!