aws / sagemaker-python-sdk

A library for training and deploying machine learning models on Amazon SageMaker
https://sagemaker.readthedocs.io/
Apache License 2.0
2.09k stars 1.13k forks source link

An easier equivalent to the removed update_endpoint argument #1920

Open athewsey opened 3 years ago

athewsey commented 3 years ago

Describe the feature you'd like

A direct/simple way to update an existing endpoint to a new model version (created e.g. by Model() constructor or Estimator.fit()).

Per the SDK v2 migration doc, Estimator.deploy() and Model.deploy() have had their update_endpoint argument removed and raise an error when called with an existing endpoint name. Users are advised to use Predictor.update_endpoint() instead.

The problem is the update_endpoint() method takes an existing SageMaker Model name as parameter and, per #1094, I'm not aware of an easy/SDK way to register a Model in the API given a Model object or a trained Estimator.

How would this feature be used? Please describe.

When a user has re-trained an Estimator or created a new Model object in the SDK, they'll be able to easily update an existing endpoint - like they would have done in v1 with Model.deploy(..., update_endpoint=True).

Describe alternatives you've considered

The implementation could maybe proceed as:

Additional context

As used in, for example, the amazon-sagemaker-analyze-model-predictions sample.

It'd be great to know if I'm just missing an easy way to use Predictor.update_endpoint() for this!

athewsey commented 3 years ago

An example flow I got working for now, which uses private/internal functions and repeats the instance type way too much:

sagemaker_model._init_sagemaker_session_if_does_not_exist('ml.m5.xlarge')
sagemaker_model._create_sagemaker_model('ml.m5.xlarge')
predictor.update_endpoint(
    model_name=sagemaker_model.name,
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',
)

...Speaking of which, it seems weird to me that initial_instance_count and instance_type are required params on the predictor call when the model_name is specified, but not otherwise? Can't it just default to the existing endpoint instance params as it would in the case where model_name wasn't changed?

kenanzh commented 3 years ago

An example flow I got working for now, which uses private/internal functions and repeats the instance type way too much:

sagemaker_model._init_sagemaker_session_if_does_not_exist('ml.m5.xlarge')
sagemaker_model._create_sagemaker_model('ml.m5.xlarge')
predictor.update_endpoint(
    model_name=sagemaker_model.name,
    initial_instance_count=1,
    instance_type='ml.m5.xlarge',
)

...Speaking of which, it seems weird to me that initial_instance_count and instance_type are required params on the predictor call when the model_name is specified, but not otherwise? Can't it just default to the existing endpoint instance params as it would in the case where model_name wasn't changed?

I am also encountering a similar issue. However, I am actually having a hard time finding the model name when using the sdk. How have you gone about doing this? I was not able to locate a place where the estimator, or associated training jobs keep track of the created model at all unfortunately, but I may just be missing it.

athewsey commented 3 years ago

@kenanzh when you call Estimator.deploy() it actually wraps around creating 3 things in the back-end, that you can see in the SageMaker Console: Model, Endpoint Configuration, and Endpoint.

In my example I was explicitly creating an SDK 'Model' object. I think you should be able to get the equivalent of my sagemaker_model by calling Estimator.create_model(...).

Note that creating a PyTorchModel(or equivalent for other frameworks) in the SDK does not actually register it in the SageMaker API, which is why I called the internal sagemaker_model._create_sagemaker_model('ml.m5.xlarge') above. Creating the "real model" in the SageMaker API requires knowing the instance type (because most frameworks have different images for GPU vs CPU), so normally it happens when you call Model.transformer() or Model.deploy(). The sagemaker_model.name property will be empty until the "real model" has been created in the API.

icywang86rui commented 3 years ago

@athewsey Thanks for using our product and the suggestion. We will have a discussion about this feature request.

kuirensu commented 3 years ago

This is a missing feature, and a very important one; please update.

amitm-sundaysky commented 3 years ago

Any update/workaround regarding this one?

bill10 commented 3 years ago

+1. Whatever good reason there is behind removing the update_endpoint arg, the migration doc reads like "go figure out yourself", which does not really help with the transition from v1 to v2. I would expect at least some example about how do perform the same function in v2, or revert this change if it is not really necessary.

elangovana commented 3 years ago

Here is how to create a model and update an existing endpoint

Create model using sagemaker session

You can create the model using sagemaker session. Depending on whether it is BYO or an existing training job chose of the one method to create the container definition.

BYO- Create model

The model is trained outside sagemaker, e.g. a Pretrained Model

Step 0 - Prerequisite FOR BYO: Package your model correctly : Note: Make sure the model_data_url is packaged correctly according to create-the-directory-structure-for-your-model-files and upload to s3. Also thanks to Joao Moura, add the ENV variable so that sagemaker knows the entry point

import sagemaker, datetime

# Retrieve the inference image uri for a GPU instance for pytorch 1.4.0
image_uri = sagemaker.image_uris.retrieve("pytorch", "us-east-2", version="1.4.0", py_version="py3", 
                              instance_type="ml.p3.2xlarge", accelerator_type=None, image_scope="inference",
                              container_version=None, distribution=None, base_framework_version=None)

# Define container def
container_def = sagemaker.session.container_def(image_uri, model_data_url,  env={'SAGEMAKER_PROGRAM': my_inference_entry_point})

# Create model
new_model_name = "my-new-model-{}".format( datetime.datetime.now().strftime("%Y%m%d%H%M%S"))
sm_session = sagemaker.session.Session()
sm_session.create_model(new_model_name, role, container_def)

Existing training job - Create model

import sagemaker, datetime
from sagemaker.pytorch.estimator  import PyTorch

# Retrieve the inference image uri for a GPU instance for pytorch 1.4.0
image_uri = sagemaker.image_uris.retrieve("pytorch", "us-east-2", version="1.4.0", py_version="py3", 
                              instance_type="ml.p3.2xlarge", accelerator_type=None, image_scope="inference",
                              container_version=None, distribution=None, base_framework_version=None)

# Attach to existing training job
estimator = PyTorch.attach(training_job_name )

# Construct PyTorch model object
new_model_name = "my-new-model-{}".format( datetime.datetime.now().strftime("%Y%m%d%H%M%S"))
model = estimator.create_model(name=new_model_name, entry_point=my_inference_entry_point,image_uri=image_uri)

# Prepare container def, so you package the entry point file and the model 
container_def = model.prepare_container_def()

# Create model - in SageMaker
sm_session = sagemaker.session.Session()
sm_session.create_model(new_model_name, role, container_def)

Update Endpoint

Once the model is created, update the existing endpoint

import sagemaker

predictor = sagemaker.pytorch.model.PyTorchPredictor(existing_endpoint_name)
predictor.update_endpoint(initial_instance_count=1, instance_type="ml.p3.2xlarge", model_name= new_model_name)
dan9059021 commented 2 years ago

Has anyone found a Tensorflow solution to this problem?

ghost commented 2 years ago

Has anyone found a solution for this ?

basselmasri commented 1 year ago

I have found a solution to this by re-coding the whole deployment script using the boto3 SDK rather than the sagemaker SDK. Here's a full code in the stackoverflow accepted answer:

https://stackoverflow.com/questions/73728499/how-to-update-an-existing-model-in-aws-sagemaker-2-0/73825605#73825605

You can indeed re-write the code and give an entry point as well as as a source directory of other code dependencies using the boto3 sagemaker. The documentation just doesn't state it unfortunately.

liujiaorr commented 4 months ago

Does this request still exist?