Closed bsun0802 closed 5 years ago
According to issue #831 and the documentation(the most up-to-date among all the incomplete and scattered ones) here
I think I made some progress, but still not yet to succeed.
Currently my code from sagemaker notebook instances look like this:
import sagemaker as sage
from sagemaker.tensorflow.serving import Model, Predictor
from sagemaker.predictor import npy_serializer, numpy_deserializer, json_deserializer
model_artifact = sage.session.Session().upload_data('ssd_resnet50.tar.gz', key_prefix='model')
# model_artifact = 's3://gl-ml-sagemaker-bsun-learn/ssd_resnet_50_21269.tar.gz'
sagemaker_role = sage.get_execution_role()
endpoint_name = 'rx-eval-form-tensorflow-serving-bsun-2'
model = Model(
entry_point='inference.py',
dependencies=['requirements.txt'],
framework_version='1.12', # tf 1.13 isn't supported by EIA yet but the model was trained in 1.13
model_data=model_artifact,
role=sagemaker_role)
predictor=model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge',
accelerator_type='ml.eia1.medium',
endpoint_name=endpoint_name,
tags=[{'Key': 'Creator', 'Value': 'bsun'}])
predictor.serializer = npy_serializer
predictor.deserializer = numpy_deserializer
predictor.accept = 'application/x-npy'
predictor.content_type = 'application/x-npy'
result = predictor.predict(test_image)
test_image is an np.array of shape (1, 1000, 2000, 3) read from cv2.imread, image stored in s3(so if you can show me how to let the TFS container pick up from s3 directly so we don't have to walk around the 5MB limits here it would be perfect).
predictor.predict() gives this ModelError: ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "{"error": "a bytes-like object is required, not 'Body'"}". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/rx-eval-form-tensorflow-serving-bsun-2 in account 881345373917 for more information.
My inference.py
is as following and I believe the model archive structure is correct as per here:
import json
import numpy as np
import io
def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data, in format of dict or string
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/json':
# pass through json (assumes it's correctly formed)
d = data.read().decode('utf-8')
return d if len(d) else ''
if context.request_content_type == 'text/csv':
# very simple csv handler
return json.dumps({
'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
})
if context.request_content_type in ('application/x-npy', "application/npy"):
data = np.load(io.BytesIO(data), allow_pickle=True)
if len(data.shape) == 4:
data = data.tolist()
else:
raise ValueError("Invalid tensor shape "+str(data.shape))
return json.dumps({
"instances": data
})
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown"))
def output_handler(data, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
data (obj): the TensorFlow serving response
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, response content type
"""
if data.status_code != 200:
raise ValueError(data.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = data.content
return prediction, response_content_type
Hi @bsun0802, thanks for using SageMaker!
The 5MB limit on incoming inference requests is enforced at the service level -- see the Request Body section of the InvokeEndpoint API docs for details. That means there's no way to change your inference script to accept larger payloads.
However, you can do inference on larger images. The trick is to use a compressed image format (like jpg or png) in your request, and then decode that into a numpy array after it has been received by SageMaker and passed to your inference script. That means you would change the input_handler
method to look for image/jpeg
or image/png
, and then convert that data to the format your model expects -- in this case probably a numpy array dumped to json.
For a more efficient endpoint, you could also look at changing your TF model input to accept image data. There's an example notebook that shows how to do this. It's written for Batch transform jobs, but the model preparation and inference script would be the same for an Endpoint.
Thanks for you response, I actually figured it out before you replied.
Yeah the example notebook you mentioned is very helpful.
Currently I can do both RealTimePredictor and Batch Inference on large images, the trick is to let the Tensorflow Model to accept an string tensor, then we can use base64 to encode the image bytes as string which is far smaller than numpy array. And implement the input_handler like in the notebook.
Thanks for you response, I actually figured it out before you replied.
Yeah the example notebook you mentioned is very helpful.
Currently I can do both RealTimePredictor and Batch Inference on large images, the trick is to let the Tensorflow Model to accept an string tensor, then we can use base64 to encode the image bytes as string which is far smaller than numpy array. And implement the input_handler like in the notebook.
Hi @bsun0802 ,
Do you mind sharing the code snippet (for getting predictions from predictor
and your inference.py
) you used to get this done?
Thanks.
@bsun0802 was kind enough to pass me the code to his inference.py
file.
Thank you very much for the help. Really appreciate it. :+1:
For future readers the inference.py
code is as:
import base64
import json
# import numpy as np
def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data stream
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/x-image': # for String tensor
# invoke endpoint with image bytes or seekable fp for the image as request body, match TensorFlow Serving SignatureDef input dtype: DT_STRING, displayed using the TensorFlow saved_model_cli
payload = data.read()
encoded_image = base64.b64encode(payload).decode('utf-8')
instance = [{"b64": encoded_image}]
return json.dumps({"instances": instance})
# elif context.request_content_type == 'application/x-npy': # for numpy array tensor
# # invoke endpoint with serialized numpy array as request body, the TensorFlow Serving SignatureDef should assume input dtype: [1, H, W, C=3]
# image_npy = np.load(BytesIO(data.read()))
# return json.dumps(image_npy.tolist())
else:
_return_error(415, 'Unsupported content type "{}"'.format(context.request_content_type or 'Unknown')
def output_handler(response, context):
"""Post-process TensorFlow Serving output before it is returned to the client.
Args:
response (obj): the TensorFlow serving response
context (Context): an object containing request and configuration details
Returns:
(bytes, string): data to return to client, response content type
"""
if response.status_code != 200:
_return_error(response.status_code, response.content.decode('utf-8'))
response_content_type = context.accept_header
prediction = response.content
return prediction, response_content_type
def _return_error(code, message):
raise ValueError('Error: {}, {}'.format(str(code), message))
Hope this helps!
Thanks for you response, I actually figured it out before you replied.
Yeah the example notebook you mentioned is very helpful.
Currently I can do both RealTimePredictor and Batch Inference on large images, the trick is to let the Tensorflow Model to accept an string tensor, then we can use base64 to encode the image bytes as string which is far smaller than numpy array. And implement the input_handler like in the notebook.
@bsun0802 how did you get the model to accept a string tensor as input? When I check my saved_model_cli, the model expects a float tensor of (batch_size, h, w, channels) shape.
@ana-pcosta
What is your model, is it tensorflow serving on pagemaker? That was a feature supported by tensorflow not AWS, which use an algorithm called base64 to encode an image as a string. So in interns of AWS, you can pass the string.
It was a long time ago and I can not remember the details. Now AWS support tf2.0 now, so may be things may have changed/updated to be easier.
I suggest you to look up some up-to-date docs for reference,
e.g.,
Just be patient and careful, it is annoying but when you understands it, it is the way how transferring image data across network works.
Please fill out the form below.
System Information
Describe the problem
I fine-tuned a object detection model with google object detection API. But from the client side, the input image is too large in dimension shape=(2400+, 1100+).
If I reshape the input image to very small size(eg, 300x300), then the json serialization will work(the command I pasted at the end works), because it will meet the 5MB limit sagemaker endpoint has. But I don't want to resize the input image to this small.
How can I write the inference.py as per here to handle such case?
There might be a different solution as per here, that looks different from the inference.py approach. So maybe one of them is deprecated?
I'm not able to get either of them to work because the documentations on serving tensorflow are very scattered.
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
for the inference.py approach, I got this error
for the input_fn/output_fn approach, I don't know how to get entry_point.py to work, which place(path) to place this script?
for the default json serializer, if the image size is too large, I got this error
During handling of the above exception, another exception occurred:
ConnectionClosedError Traceback (most recent call last)
ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/{endpoint-name}/invocations".