aws / sagemaker-tensorflow-serving-container

A TensorFlow Serving solution for use in SageMaker. This repo is now deprecated.
Apache License 2.0
172 stars 101 forks source link

Published tensorflow-inference:2.1-cpu image does not support multi-models #156

Closed svpino closed 4 years ago

svpino commented 4 years ago

Describe the bug A clear and concise description of what the bug is.

The TensorFlow Inference 2.1 CPU docker container was updated to include multi-models support. However, when pulling the image from the default ECR repositories, I get the following error

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Your Ecr Image 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.1-cpu does not contain required com.amazonaws.sagemaker.capabilities.multi-models=true Docker label(s).

To reproduce

Here is the code I'm using:

import sagemaker
import sagemaker.tensorflow.serving as serving

from sagemaker.multidatamodel import MultiDataModel
from sagemaker.predictor import json_serializer, json_deserializer, RealTimePredictor

sagemaker_session = sagemaker.Session()

env = {
    'SAGEMAKER_MULTI_MODEL': 'True'
}

model = serving.Model(
    name="testing",
    model_data="s3://<BUCKET>/models/",
    role=sagemaker.get_execution_role(),
    framework_version="2.1",
    sagemaker_session=sagemaker_session,
    env=env
)

mme = MultiDataModel(
    name="testing",
    model_data_prefix="s3://<BUCKET>/models/",
    model=model,
    sagemaker_session=sagemaker_session
)

mme.deploy(
    initial_instance_count=1,
    instance_type="ml.m4.xlarge",
    endpoint_name="testing")

Expected behavior

A new Multi-Model endpoint should be deployed.

Screenshots or logs Here is the error message I get:

Parameter image will be renamed to image_uri in SageMaker Python SDK v2.
'create_image_uri' will be deprecated in favor of 'ImageURIProvider' class in SageMaker Python SDK v2.
---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
<ipython-input-51-5bb12186268f> in <module>
     22     initial_instance_count=1,
     23     instance_type="ml.m4.xlarge",
---> 24     endpoint_name="testing")

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/multidatamodel.py in deploy(self, initial_instance_count, instance_type, accelerator_type, endpoint_name, update_endpoint, tags, kms_key, wait, data_capture_config)
    227             vpc_config=vpc_config,
    228             enable_network_isolation=enable_network_isolation,
--> 229             tags=tags,
    230         )
    231 

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/sagemaker/session.py in create_model(self, name, role, container_defs, vpc_config, enable_network_isolation, primary_container, tags)
   2137 
   2138         try:
-> 2139             self.sagemaker_client.create_model(**create_model_request)
   2140         except ClientError as e:
   2141             error_code = e.response["Error"]["Code"]

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    314                     "%s() only accepts keyword arguments." % py_operation_name)
    315             # The "self" in this scope is referring to the BaseClient.
--> 316             return self._make_api_call(operation_name, kwargs)
    317 
    318         _api_call.__name__ = str(py_operation_name)

~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    633             error_code = parsed_response.get("Error", {}).get("Code")
    634             error_class = self.exceptions.from_code(error_code)
--> 635             raise error_class(parsed_response, operation_name)
    636         else:
    637             return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreateModel operation: Your Ecr Image 763104351884.dkr.ecr.us-west-2.amazonaws.com/tensorflow-inference:2.1-cpu does not contain required com.amazonaws.sagemaker.capabilities.multi-models=true Docker label(s).

System information A description of your system. Please provide:

Additional context N/A

laurenyu commented 4 years ago

I don't see the multi-model label in the TFS 2.1 Dockerfile: https://github.com/aws/deep-learning-containers/blob/master/tensorflow/inference/docker/2.1.1/py3/Dockerfile.cpu

The Dockerfiles are now housed in https://github.com/aws/deep-learning-containers - can you please open an issue in that repository? Thanks!