Multi-model endpoint: load new model, unload model, update model

kevin-yauris commented 4 years ago

Hi, first I want to thank you for developing this. I would like to use a multi-model endpoint with Sagemaker Tensorflow Serving Container, but there are some things that confuse me. Any feedback or answer will be really appreciated.

What did you find confusing? Please describe. It seems that there are 2 different definitions of a multi-model endpoint. One is using Multi-Model Server Library (let's call this general multi-model endpoint) and the other is the one described in Sagemaker Tensorflow: Deploying Tensorflow Serving - Deploying more than one model to your endpoint (let's call this one TFS multi-model endpoint). I am confused because both of them use the same term multi-model endpoint but seems like a different feature since both used differently. Is the TFS multi-model can use method or interface enabled in general multi-model endpoint? In general multi-model endpoint, there is documentation to add and remove models from a multi-model enpoint. Can a model build using TensorFlowModel deployed into a general multi-model endpoint and use this?

What that I want to know is did Sagemaker Tensorflow Serving support load, unload and update the model without creating a new endpoint. In this readme section there are some interfaces to do this, but I can't find a way to access it through SDK or creating a request into the endpoint URL.

Describe how documentation can be improved I think it will be great if the documentation explains the difference or connection of the TFS multi-model endpoint and general multi-model endpoint. The other that I would to request is a tutorial (a Jupyter notebook will be great) of how to unload, load a new model, and update a model in a TFS multi-model endpoint, the current readme section explains that there are some interfaces but doesn't explain how to use it if the container is deployed on an endpoint.

Additional context I have tried to deploy a TensorflowModel as a MultiDataModel by following Multi-Model Endpoint XGBoost Sample Notebook example but it seems there is some error when using it do to prediction.

env = {
  'SAGEMAKER_TFS_DEFAULT_MODEL_NAME': 'model1',
    'SAGEMAKER_MULTI_MODEL': 'true'
}

model = TensorFlowModel(name='tfs-multi-model-endpoints', model_data=model_data, role=role, framework_version='2.2.0', env=env, sagemaker_session=sagemaker_session)

from sagemaker.multidatamodel import MultiDataModel
mme = MultiDataModel(name='tf-mme',
                     model_data_prefix=model_data_prefix,
                     model=model,
                     sagemaker_session=sagemaker_session,
                     #image_uri=tf_image,
                    )

predictor = mme.deploy(initial_instance_count=1,
                       instance_type='ml.t2.medium',
                       endpoint_name='test-tf-mme')

result = predictor.predict(classification_input, target_model='model1.tar.gz')

The error is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-2752bb9dcc74> in <module>
----> 1 result = predictor.predict(classification_input, target_model='model1.tar.gz')

TypeError: predict() got an unexpected keyword argument 'target_model'

If the target model is not give by using predictor.predict(classification_input), this error shown up ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Request acd28d7b-c1b4-4ce1-9f06-c1fdefb58cee is missing a target model header, which is required to invoke multi-model endpoint test-tf-mme.

I also tried to invoke endpoint:

runtime_sm_client = boto3.client(service_name='sagemaker-runtime')
runtime_sm_client.invoke_endpoint(
                        EndpointName = 'test-tf-mme',
                        ContentType  = 'application/json',
                        TargetModel  = 'multi.tar.gz',
                        Body         = '{"instances": [{"input_ids": [101, 1045, 2031, 2699, 2068, 2035, 2005, 2026, 2986, 2606, 1012, 102],  "attention_mask": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}, {"input_ids": [101, 2023, 2003, 2919, 102, 0, 0, 0, 0, 0, 0, 0],  "attention_mask": [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0]}, {"input_ids": [101, 2023, 2025, 2919, 1010, 3243, 2204, 2941, 102, 0, 0, 0],  "attention_mask": [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]}]}')

this error is shown:

    633             error_code = parsed_response.get("Error", {}).get("Code")
    634             error_class = self.exceptions.from_code(error_code)
--> 635             raise error_class(parsed_response, operation_name)
    636         else:
    637             return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>

  </body>
</html>
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/test-tf-mme in account 682361690817 for more information.

kevin-yauris commented 4 years ago

@laurenyu @ajaykarpur Hi, can you guys kindly respond to this issue please? Me and my friend spend several days trying to figure this thing out but we can't find any reference or answer yet. Thanks in advance.

laurenyu commented 4 years ago

looking at this part of the documentation that you linked - https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#creating-predictor-instances-for-different-models -

have you tried something like:

predictor = TensorFlowPredictor("test-tf-mme", model_name="model1")
predictor.predict(classification_input)

(replace variables/strings as appropriate - I tried to guess based on what you pasted above)

kevin-yauris commented 4 years ago

Hi @laurenyu, thank you for answering. I've tried what you suggest and still got some error

    633             error_code = parsed_response.get("Error", {}).get("Code")
    634             error_class = self.exceptions.from_code(error_code)
--> 635             raise error_class(parsed_response, operation_name)
    636         else:
    637             return parsed_response

ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Request 186633a1-9a9f-4557-8e2b-038d9aa4bba4 is missing a target model header, which is required to invoke multi-model endpoint test-tf-mme.

by the way, do you know how to use Multi-model interfaces explained in this readme section? After creating a multi-model endpoint how to use these interfaces?

laurenyu commented 4 years ago

looked a little deeper - @ajaykarpur please correct me if I'm wrong, but it looks like the TensorFlowModel.predict() method is missing some of the args supported by the generic Predictor.predict() method.

Based on this line of code, here's my guess at a workaround:

predictor.predict(classification_input, initial_args={"target_model": "model1"})

by the way, do you know how to use Multi-model interfaces explained in this readme section? After creating a multi-model endpoint how to use these interfaces?

I seem to recall there being a way to get a direct URL to the endpoint, but I'm not finding the documentation at the moment. (sorry, it's been awhile since I've worked on this stuff...😅)

kevin-yauris commented 4 years ago

Hi @laurenyu thank you for your suggestion. I have tried your suggestion but it seems we need to use TargetModel instead of target_model as dict key. so I tried it with predictor.predict(classification_input, initial_args={"TargetModel": "model1.tar.gz"}), but still encounter similar error with what I have tried before

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "<html>
  <head>
    <title>Internal Server Error</title>
  </head>
  <body>
    <h1><p>Internal Server Error</p></h1>

  </body>
</html>
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/test-tf-mme in account 682361690817 for more information.

CloudWatch log

2020-09-15 03:50:15.924080: W tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:267] No versions of servable ed8030f1549d2b8c0d9fc09a3d3cd31c found under base path /opt/ml/models/ed8030f1549d2b8c0d9fc09a3d3cd31c/model

I seem to recall there being a way to get a direct URL to the endpoint, but I'm not finding the documentation at the moment. (sorry, it's been awhile since I've worked on this stuff...😅)

We can get the URL to the endpoint from Amazon Sagemaker dashboard, it is something like this https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/test-tf-mme/invocations. If I can get the URL, how to use the interfaces with this URL?

There is a link in the dashboard below the URL: Learn more about the API. But it just explains how to invoke endpoint and do prediction, no documentation on how to load model and unload model.

laurenyu commented 4 years ago

No versions of servable ed8030f1549d2b8c0d9fc09a3d3cd31c found under base path /opt/ml/models/ed8030f1549d2b8c0d9fc09a3d3cd31c/model

what does your model.tar.gz look like?

We can get the URL to the endpoint from Amazon Sagemaker dashboard, it is something like this https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/test-tf-mme/invocations. If I can get the URL, how to use the interfaces with this URL?

There is a link in the dashboard below the URL: Learn more about the API. But it just explains how to invoke endpoint and do prediction, no documentation on how to load model and unload model.

/invocations is what's added from InvokeEndpoint, so my guess would be that https://runtime.sagemaker.us-east-1.amazonaws.com/endpoints/test-tf-mme would be your base URL. (I've never tried it, though.)

kevin-yauris commented 4 years ago

hi @laurenyu, thank you for your help! We already successfully using TFS with Multi-model Endpoint. It was because an invalid model structure. Previously my model1.tar.gz look like this

└── model1.tar.gz
  ├── model1
     └── <version number>
         ├── saved_model.pb
         └── variables
            └── ...

after changing it to:

└── model1.tar.gz
     └── <version number>
         ├── saved_model.pb
         └── variables
            └── ...

it is working now. I leave a snippet of our code, just in case someone encounter this trouble too.

container = { 
    'Image': image,
    'ModelDataUrl': model_data_location,
    'Mode': 'MultiModel'
}

sagemaker_client = boto3.client('sagemaker')

# Create Model
response = sagemaker_client.create_model(
              ModelName = model_name,
              ExecutionRoleArn = role,
              Containers = [container])

# Create Endpoint Configuration
response = sagemaker_client.create_endpoint_config(
    EndpointConfigName = endpoint_configuration_name,
    ProductionVariants=[{
        'InstanceType': 'ml.t2.medium',
        'InitialInstanceCount': 1,
        'InitialVariantWeight': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

# Create Endpoint
response = sagemaker_client.create_endpoint(
              EndpointName = endpoint_name,
              EndpointConfigName = endpoint_configuration_name)

# Invoke Endpoint
sagemaker_runtime_client = boto3.client('sagemaker-runtime')

content_type = "application/json" # The MIME type of the input data in the request body.
accept = "application/json" # The desired MIME type of the inference in the response.
payload = json.dumps({"instances": [1.0, 2.0, 5.0]}) # Payload for inference.
target_model = 'model1.tar.gz'

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name, 
    ContentType=content_type,
    Accept=accept,
    Body=payload,
    TargetModel=target_model,
)

response

manjunathsudheer666 commented 3 years ago

@kevin-yauris I am trying to build a multi model endpoint using the locally trained model artefacts (.pb and variable files). Could you tell me how to figure out the {version number} while creating the .tar file ? Thanks in advance

kevin-yauris commented 3 years ago

for the version number, I just use 1 or 2. I think it doesn't matter as long as the newest version has the highest version number.

aws / sagemaker-tensorflow-serving-container

Multi-model endpoint: load new model, unload model, update model #166