aws / sagemaker-training-toolkit

Train machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
496 stars 118 forks source link

WIP - Allow entrypoint definition via python #114

Closed ghost closed 2 years ago

ghost commented 2 years ago

The API of sagemaker-training-toolkit does not match that of sagemaker-inference-toolkit - which makes the framework awkward to use in the case where someone wants to use a single docker image for training and inference on sagemaker.

More specifically, sagemaker-inference-toolkit requires a python entrypoint

This PR is a very simple change that allows for a module name to be passed to sagemaker-training-toolkit via python code, rather than specifying it via the environment variable SAGEMAKER_PROGRAM. This allows a single entrypoint to be used to call sagemaker-inference-toolkit or sagemaker-training-toolkit, and keeps the code path cleaner.

An example entrypoint using this format:


import sys
from sagemaker_inference import model_server
from sagemaker_training import trainer

import train
import serve

def main():
    if sys.argv[1] == "serve":
        model_server.start_model_server(serve.__name__)
    elif sys.argv[1] == "train":
        trainer.train(train.__name__)
    else:
        raise ValueError("Container must be run with one of [serve, train]")

if __name__ == "__main__":
    main()

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

Tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 2 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository