sagemaker.tensorflow.serving.Model with input_handler is much slower than keras.model on GPU instance

I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.

Setup 1:

Sagemaker Notebook instance ml.g4dn.xlarge. Load the model using reconstructed_model = keras.models.load_model()
Decode jpeg image, do some preprocessing to save as numpy arrays
call reconstructed_model.predict(). This call returns in ~300-400ms

Setup 2:

Sagemaker Notebook instance ml.g4dn.xlarge. Upload model artifacts to s3
Create inference.py to decode jpeg image and do some preprocessing to numpy arrays
Create model sm-model = TensorFlowModel(model_data=model_data, entry_point='inference.py', source_dir='src', framework_version="2.4.1", env={"SAGEMAKER_REQUIREMENTS": "requirements.txt"}, role=role) uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge', endpoint_name='g4dn-xlarge-endpoint')
Call predict uncompiled_predictor.predict() This takes ~11-12 seconds to return.

From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.

Screenshots or logs CloudWatch screenshot

System information A description of your system. Please provide:

Toolkit version:11.0
Framework version: 2.4.1
Python version:37
CPU or GPU:GPU
Custom Docker image (Y/N):N

aws / sagemaker-tensorflow-serving-container

sagemaker.tensorflow.serving.Model with input_handler is much slower than keras.model on GPU instance #213