aws / sagemaker-tensorflow-serving-container

A TensorFlow Serving solution for use in SageMaker. This repo is now deprecated.
Apache License 2.0
174 stars 101 forks source link

sagemaker.tensorflow.serving.Model with input_handler is much slower than keras.model on GPU instance #213

Open biyer19 opened 2 years ago

biyer19 commented 2 years ago

I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.

Setup 1:

Setup 2:

From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.

Screenshots or logs CloudWatch screenshot

System information A description of your system. Please provide: