Serving Tensorflow object detection model, input image size too large

bsun0802 commented 5 years ago

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Tensorflow
Framework Version: 1.13
Python Version: 3.6
CPU or GPU: CPU
Python SDK Version:
Are you using a custom image: No

Describe the problem

I fine-tuned a object detection model with google object detection API. But from the client side, the input image is too large in dimension shape=(2400+, 1100+).

If I reshape the input image to very small size(eg, 300x300), then the json serialization will work(the command I pasted at the end works), because it will meet the 5MB limit sagemaker endpoint has. But I don't want to resize the input image to this small.

How can I write the inference.py as per here to handle such case?

There might be a different solution as per here, that looks different from the inference.py approach. So maybe one of them is deprecated?

I'm not able to get either of them to work because the documentations on serving tensorflow are very scattered.

Minimal repro / logs

Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

for the inference.py approach, I got this error

Error hosting endpoint rx-eval-form-tensorflow-serving-bsun-2: Failed Reason:  The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.

for the input_fn/output_fn approach, I don't know how to get entry_point.py to work, which place(path) to place this script?

for the default json serializer, if the image size is too large, I got this error


ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

ConnectionClosedError Traceback (most recent call last)

ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/{endpoint-name}/invocations".


- **Exact command to reproduce**:
```python
from sagemaker.tensorflow.serving import Model,Predictor
from sagemaker.predictor import npy_serializer, numpy_deserializer, json_deserializer

# https://medium.com/datadriveninvestor/model-deployment-using-aws-sagemaker-8116adea7184
model_data = sage.session.Session().upload_data('ssd_resnet_50_21269.tar.gz', key_prefix='model')

# create a sagemaker role
sagemaker_role = sage.get_execution_role()

# endpoint to be created
endpoint_name = 'rx-eval-form-tensorflow-serving-bsun'

# create a Tensorflow model for endpoint deployment 
model = Model(model_data=model_data,
              role=sagemaker_role,
              framework_version="1.13",
#               entry_point='serve.py',
              name='ssd-resnet-50')

predictor = model.deploy(initial_instance_count=1, 
             instance_type='ml.c5.xlarge',
             accelerator_type='ml.eia1.medium',
             endpoint_name=endpoint_name,
             tags=[{'Key': 'Creator', 'Value': 'bsun'}])

# predictor = Predictor(endpoint_name=endpoint_name,
#                       serializer=npy_serializer, deserializer=numpy_deserializer)

def read_image_from_s3(s3_bucket_key, image_name):
    image_path = os.path.join(s3_bucket_key, image_name)
    with smart_open.open(image_path, 'rb') as f:
        buf = np.frombuffer(f.read(), np.uint8)
    imarr = cv2.imdecode(buf, 1)
    imarr = cv2.resize(imarr, None, fx=.5, fy=.5)
    imarr = np.expand_dims(imarr, axis=0)
    return imarr

path_to_image = 's3://gl-ml-sagemaker-bsun-learn/test-images/'
test_image_name = 'TuyetHoa.Bui..305227347.jpg'
test_image = read_image_from_s3(path_to_image, test_image_name)

model_result = predictor.predict(test_image)

bsun0802 commented 5 years ago

According to issue #831 and the documentation(the most up-to-date among all the incomplete and scattered ones) here

I think I made some progress, but still not yet to succeed.

Currently my code from sagemaker notebook instances look like this:

import sagemaker as sage
from sagemaker.tensorflow.serving import Model, Predictor
from sagemaker.predictor import npy_serializer, numpy_deserializer, json_deserializer

model_artifact = sage.session.Session().upload_data('ssd_resnet50.tar.gz', key_prefix='model')
# model_artifact = 's3://gl-ml-sagemaker-bsun-learn/ssd_resnet_50_21269.tar.gz'
sagemaker_role = sage.get_execution_role()
endpoint_name = 'rx-eval-form-tensorflow-serving-bsun-2'

model = Model(
              entry_point='inference.py',
              dependencies=['requirements.txt'],
              framework_version='1.12',  # tf 1.13 isn't supported by EIA yet but the model was trained in 1.13 
              model_data=model_artifact,
              role=sagemaker_role)

predictor=model.deploy(initial_instance_count=1, 
             instance_type='ml.m4.xlarge',
             accelerator_type='ml.eia1.medium',   
             endpoint_name=endpoint_name,
             tags=[{'Key': 'Creator', 'Value': 'bsun'}])

predictor.serializer = npy_serializer
predictor.deserializer = numpy_deserializer

predictor.accept = 'application/x-npy'
predictor.content_type = 'application/x-npy'

result = predictor.predict(test_image)

test_image is an np.array of shape (1, 1000, 2000, 3) read from cv2.imread, image stored in s3(so if you can show me how to let the TFS container pick up from s3 directly so we don't have to walk around the 5MB limits here it would be perfect).

predictor.predict() gives this ModelError: ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{"error": "a bytes-like object is required, not 'Body'"}". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/rx-eval-form-tensorflow-serving-bsun-2 in account 881345373917 for more information.

My inference.py is as following and I believe the model archive structure is correct as per here:

import json
import numpy as np
import io

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API
    Args:
        data (obj): the request data, in format of dict or string
        context (Context): an object containing request and configuration details
    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """
    if context.request_content_type == 'application/json':
        # pass through json (assumes it's correctly formed)
        d = data.read().decode('utf-8')
        return d if len(d) else ''

    if context.request_content_type == 'text/csv':
        # very simple csv handler
        return json.dumps({
            'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
        })

    if context.request_content_type in ('application/x-npy', "application/npy"):
        data = np.load(io.BytesIO(data), allow_pickle=True)
        if len(data.shape) == 4:
            data = data.tolist()
        else:
            raise ValueError("Invalid tensor shape "+str(data.shape))
        return json.dumps({
            "instances": data
        })

    raise ValueError('{{"error": "unsupported content type {}"}}'.format(
        context.request_content_type or "unknown"))

def output_handler(data, context):
    """Post-process TensorFlow Serving output before it is returned to the client.
    Args:
        data (obj): the TensorFlow serving response
        context (Context): an object containing request and configuration details
    Returns:
        (bytes, string): data to return to client, response content type
    """
    if data.status_code != 200:
        raise ValueError(data.content.decode('utf-8'))

    response_content_type = context.accept_header
    prediction = data.content
    return prediction, response_content_type

jesterhazy commented 5 years ago

Hi @bsun0802, thanks for using SageMaker!

The 5MB limit on incoming inference requests is enforced at the service level -- see the Request Body section of the InvokeEndpoint API docs for details. That means there's no way to change your inference script to accept larger payloads.

However, you can do inference on larger images. The trick is to use a compressed image format (like jpg or png) in your request, and then decode that into a numpy array after it has been received by SageMaker and passed to your inference script. That means you would change the input_handler method to look for image/jpeg or image/png, and then convert that data to the format your model expects -- in this case probably a numpy array dumped to json.

For a more efficient endpoint, you could also look at changing your TF model input to accept image data. There's an example notebook that shows how to do this. It's written for Batch transform jobs, but the model preparation and inference script would be the same for an Endpoint.

bsun0802 commented 5 years ago

Thanks for you response, I actually figured it out before you replied.

Yeah the example notebook you mentioned is very helpful.

Currently I can do both RealTimePredictor and Batch Inference on large images, the trick is to let the Tensorflow Model to accept an string tensor, then we can use base64 to encode the image bytes as string which is far smaller than numpy array. And implement the input_handler like in the notebook.

prameshbajra commented 4 years ago

Thanks for you response, I actually figured it out before you replied.

Yeah the example notebook you mentioned is very helpful.

Currently I can do both RealTimePredictor and Batch Inference on large images, the trick is to let the Tensorflow Model to accept an string tensor, then we can use base64 to encode the image bytes as string which is far smaller than numpy array. And implement the input_handler like in the notebook.

Hi @bsun0802 , Do you mind sharing the code snippet (for getting predictions from predictor and your inference.py) you used to get this done?

Thanks.

prameshbajra commented 4 years ago

@bsun0802 was kind enough to pass me the code to his inference.py file. Thank you very much for the help. Really appreciate it. :+1:

For future readers the inference.py code is as:

import base64
import json
# import numpy as np

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API

    Args:
        data (obj): the request data stream
        context (Context): an object containing request and configuration details

    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """

    if context.request_content_type == 'application/x-image':  # for String tensor
        # invoke endpoint with image bytes or seekable fp for the image as request body, match TensorFlow Serving SignatureDef input dtype: DT_STRING, displayed using the TensorFlow saved_model_cli 
        payload = data.read()
        encoded_image = base64.b64encode(payload).decode('utf-8')
        instance = [{"b64": encoded_image}]
        return json.dumps({"instances": instance})

#     elif context.request_content_type == 'application/x-npy':  # for numpy array tensor
#         # invoke endpoint with serialized numpy array as request body, the TensorFlow Serving SignatureDef should assume input dtype: [1, H, W, C=3]
#         image_npy = np.load(BytesIO(data.read()))
#         return json.dumps(image_npy.tolist())

    else:
        _return_error(415, 'Unsupported content type "{}"'.format(context.request_content_type or 'Unknown') 

def output_handler(response, context):
    """Post-process TensorFlow Serving output before it is returned to the client.

    Args:
        response (obj): the TensorFlow serving response
        context (Context): an object containing request and configuration details

    Returns:
        (bytes, string): data to return to client, response content type
    """
    if response.status_code != 200:
        _return_error(response.status_code, response.content.decode('utf-8'))
    response_content_type = context.accept_header
    prediction = response.content
    return prediction, response_content_type

def _return_error(code, message):
    raise ValueError('Error: {}, {}'.format(str(code), message))

Hope this helps!

ana-pcosta commented 4 years ago

Thanks for you response, I actually figured it out before you replied.

Yeah the example notebook you mentioned is very helpful.

Currently I can do both RealTimePredictor and Batch Inference on large images, the trick is to let the Tensorflow Model to accept an string tensor, then we can use base64 to encode the image bytes as string which is far smaller than numpy array. And implement the input_handler like in the notebook.

@bsun0802 how did you get the model to accept a string tensor as input? When I check my saved_model_cli, the model expects a float tensor of (batch_size, h, w, channels) shape.

bsun0802 commented 4 years ago

@ana-pcosta

What is your model, is it tensorflow serving on pagemaker? That was a feature supported by tensorflow not AWS, which use an algorithm called base64 to encode an image as a string. So in interns of AWS, you can pass the string.

It was a long time ago and I can not remember the details. Now AWS support tf2.0 now, so may be things may have changed/updated to be easier.

I suggest you to look up some up-to-date docs for reference,

e.g.,

Just be patient and careful, it is annoying but when you understands it, it is the way how transferring image data across network works.

aws / sagemaker-python-sdk