I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.
Setup 1:
Sagemaker Notebook instance ml.g4dn.xlarge. Load the model using reconstructed_model = keras.models.load_model()
Decode jpeg image, do some preprocessing to save as numpy arrays
call reconstructed_model.predict(). This call returns in ~300-400ms
Setup 2:
Sagemaker Notebook instance ml.g4dn.xlarge. Upload model artifacts to s3
Create inference.py to decode jpeg image and do some preprocessing to numpy arrays
Call predict uncompiled_predictor.predict() This takes ~11-12 seconds to return.
From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.
Screenshots or logs
System information
A description of your system. Please provide:
I am trying to follow this notebook to deploy an image processing model on sagemaker endpoint ml.g4dn.xlarge instance and found that adding image preprocessing using entrypoint script is much slower. Please consider the two cases below. In both cases I am using the same tensorflow saved model and same image(s) b64 encoded.
Setup 1:
reconstructed_model = keras.models.load_model()
reconstructed_model.predict()
. This call returns in ~300-400msSetup 2:
sm-model = TensorFlowModel(model_data=model_data, entry_point='inference.py', source_dir='src', framework_version="2.4.1", env={"SAGEMAKER_REQUIREMENTS": "requirements.txt"}, role=role)
uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type='ml.g4dn.xlarge', endpoint_name='g4dn-xlarge-endpoint')
uncompiled_predictor.predict()
This takes ~11-12 seconds to return.From Cloudwatch logs, majority of the time (~8seconds) is spent after input_handler returns and before output_handler is invoked. From the logs, it also appears to be using GPU.
Screenshots or logs
System information A description of your system. Please provide: