Closed fernandoamat closed 4 years ago
Hi,
LocalMode is implemented outside of the containers. For the code running inside of the containers it should work exactly the same with localmode and cloudmode. Could you show me the entire log from endpoint? Including when the endpoint was starting. Thanks!
Hi @icywang86rui ,
Here I copy the entire log from cloudwatch for the endpoint: it basically shows from the moment I call estimator.deploy
to create the endpoint to the moment I try to invoke the endpoint and it returns an error. Let me know if you need anything else.
Thanks,
@fernandoamat
2018-10-16 12:34:55,034 INFO - root - running container entrypoint
2018-10-16 12:34:55,034 INFO - root - starting serve task
2018-10-16 12:34:55,034 INFO - container_support.serving - reading config
2018-10-16 12:34:55,548 INFO - container_support.serving - importing user module
2018-10-16 12:34:55,548 INFO - container_support.serving - loading framework-specific dependencies
2018-10-16 12:34:57,024 INFO - container_support.serving - starting nginx
2018-10-16 12:34:57,043 INFO - container_support.serving - starting gunicorn
2018-10-16 12:34:57,051 INFO - container_support.serving - inference server started. waiting on processes: set([21, 22])
2018-10-16 12:34:57.117647: I tensorflow_serving/model_servers/main.cc:154] Building single TensorFlow model file config: model_name: generic_model model_base_path: /opt/ml/model/export/Servo
2018-10-16 12:34:57.118730: I tensorflow_serving/model_servers/server_core.cc:444] Adding/updating models.
2018-10-16 12:34:57.118753: I tensorflow_serving/model_servers/server_core.cc:499] (Re-)adding model: generic_model
2018-10-16 12:34:57.123848: I tensorflow_serving/core/basic_manager.cc:716] Successfully reserved resources to load servable {name: generic_model version: 1538917129}
2018-10-16 12:34:57.123868: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: generic_model version: 1538917129}
2018-10-16 12:34:57.123983: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: generic_model version: 1538917129}
2018-10-16 12:34:57.124102: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:360] Attempting to load native SavedModelBundle in bundle-shim from: /opt/ml/model/export/Servo/1538917129
2018-10-16 12:34:57.124180: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:242] Loading SavedModel with tags: { serve }; from: /opt/ml/model/export/Servo/1538917129
2018-10-16 12:34:57.125830: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2018-10-16 12:34:57.145138: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:161] Restoring SavedModel bundle.
2018-10-16 12:34:57.156822: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:196] Running LegacyInitOp on SavedModel bundle.
2018-10-16 12:34:57.164458: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:291] SavedModel load for tags { serve }; Status: success. Took 40328 microseconds.
2018-10-16 12:34:57.164739: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: generic_model version: 1538917129}
2018-10-16 12:34:57.168421: I tensorflow_serving/model_servers/main.cc:316] Running ModelServer at 0.0.0.0:9000 ...
[2018-10-16 12:34:57 +0000] [22] [INFO] Starting gunicorn 19.9.0
[2018-10-16 12:34:57 +0000] [22] [INFO] Listening at: unix:/tmp/gunicorn.sock (22)
[2018-10-16 12:34:57 +0000] [22] [INFO] Using worker: gevent
[2018-10-16 12:34:57 +0000] [47] [INFO] Booting worker with pid: 47
[2018-10-16 12:34:57 +0000] [48] [INFO] Booting worker with pid: 48
2018-10-16 12:34:57,487 INFO - container_support.serving - creating Server instance
2018-10-16 12:34:57,509 INFO - container_support.serving - creating Server instance
2018-10-16 12:34:58,805 INFO - tf_container - ---------------------------Model Spec---------------------------
2018-10-16 12:34:58,806 INFO - tf_container - {
"modelSpec": {
"version": "1538917129",
"name": "generic_model"
},
"metadata": {
"signature_def": {
"@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap",
"signatureDef": {
"serving_default": {
"inputs": {
"inputs": {
"dtype": "DT_FLOAT",
"name": "Placeholder_1:0",
"tensorShape": {
"dim": [
{
"size": "-1"
},
{
"size": "2"
}
]
}
}
},
"methodName": "tensorflow/serving/predict",
"outputs": {
"price": {
"dtype": "DT_FLOAT",
"name": "Reshape:0",
"tensorShape": {
"dim": [
{
"size": "-1"
}
]
}
}
}
}
}
}
}
}
2018-10-16 12:34:58,807 INFO - tf_container - ----------------------------------------------------------------
2018-10-16 12:34:58,807 INFO - tf_container - TF Serving model successfully loaded
2018-10-16 12:34:58,810 INFO - container_support.serving - returning initialized server
2018-10-16 12:34:58,824 INFO - tf_container - ---------------------------Model Spec---------------------------
2018-10-16 12:34:58,824 INFO - tf_container - {
"modelSpec": {
"version": "1538917129",
"name": "generic_model"
},
"metadata": {
"signature_def": {
"@type": "type.googleapis.com/tensorflow.serving.SignatureDefMap",
"signatureDef": {
"serving_default": {
"inputs": {
"inputs": {
"dtype": "DT_FLOAT",
"name": "Placeholder_1:0",
"tensorShape": {
"dim": [
{
"size": "-1"
},
{
"size": "2"
}
]
}
}
},
"methodName": "tensorflow/serving/predict",
"outputs": {
"price": {
"dtype": "DT_FLOAT",
"name": "Reshape:0",
"tensorShape": {
"dim": [
{
"size": "-1"
}
]
}
}
}
}
}
}
}
}
2018-10-16 12:34:58,824 INFO - tf_container - ----------------------------------------------------------------
2018-10-16 12:34:58,825 INFO - tf_container - TF Serving model successfully loaded
2018-10-16 12:34:58,826 INFO - container_support.serving - returning initialized server
[2018-10-16 12:36:46,649] ERROR in serving: invalid literal for float(): 0.5,0.2
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/serving.py", line 182, in _invoke
self.transformer.transform(content, input_content_type, requested_output_content_type)
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 281, in transform
return self.transform_fn(data, content_type, accepts), accepts
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 207, in f
input = input_fn(serialized_data, content_type)
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 238, in _default_input_fn
return self._parse_csv_request(serialized_data)
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 195, in _parse_csv_request
full_array = [float(i) for i in row]
ValueError: invalid literal for float(): 0.5,0.2
[2018-10-16 12:36:46,650] ERROR in serving: invalid literal for float(): 0.5,0.2
Could you show me the environment variables set in your sagemaker model's primary container? You can find this from the aws console on the model's page. Let's confirm that the model is setup correctly.
Hi @icywang86rui, Here are the environment variables listed in the models page.
SAGEMAKER_CONTAINER_LOG_LEVEL 20
SAGEMAKER_ENABLE_CLOUDWATCH_METRICS false
SAGEMAKER_PROGRAM keras_linear_regression_synthetic_house_price.py
SAGEMAKER_REGION us-east-2
SAGEMAKER_SUBMIT_DIRECTORY s3://test-estimator/customcode/tensorflow_synthetic_test_house_price/house-price-estimator-tensorflow-test-ts1539692900/source/sourcedir.tar.gz
Thanks, Fernando
Hi @fernandoamat ,
The input_fn is working in SageMaker as well. Let me explain what I think is happening:
"{\"inputs\": [ [0.5, 0.2]]}"
and returns {"inputs": [ [0.5, 0.2]]}
To achieve your goal, which is to receive "{\"int_living_sqft\": 0.5, \"int_beds\": 0.2}"
as a valid request , I would do:
def input_fn(data, content_type):
"""
if content_type == "application/json":
# Expected a Json string like {"int_living_sqft": 0.5, "int_beds": 0.2}
obj = json.loads(clean_input)
return {"inputs": [[obj['int_living_sqft'], obj['int_beds']]}
Please, let me know if it works.
Thanks for using SageMaker.
Márcio
Hi @mvsusp Thanks for taking a look at this and for the clear explanation. I included the modification you suggested and rerun the notebook to retrain and redeploy the endpoint and unfortunately it does not work. When I call the endpoint like this:
aws sagemaker-runtime invoke-endpoint --region 'us-east-2' --endpoint-name house-price-estimator-test-synthetic-keras --body "{\"int_living_sqft\": 0.5, \"int_beds\": 0.2}" --content-type "application/json" --accept "application/json" outputJson.json
I get the following stacktrace in cloudwatch:
[2018-10-30 12:48:02,868] ERROR in serving: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="input size does not match signature")
12:48:03
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/container_support/serving.py", line 182, in _invoke self.transformer.transform(content, input_content_type, requested_output_content_type) File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 281, in transform return self.transform_fn(data, content_type, accepts)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/serving.py", line 182, in _invoke
self.transformer.transform(content, input_content_type, requested_output_content_type)
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 281, in transform
return self.transform_fn(data, content_type, accepts), accepts
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 208, in f
prediction = self.predict_fn(input)
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 223, in predict_fn
return self.proxy_client.request(data)
File "/usr/local/lib/python2.7/dist-packages/tf_container/proxy_client.py", line 71, in request
return request_fn(data)
File "/usr/local/lib/python2.7/dist-packages/tf_container/proxy_client.py", line 99, in predict
result = self.prediction_service_stub.Predict(request, self.request_timeout)
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 309, in __call__
self._request_serializer, self._response_deserializer)
File "/usr/local/lib/python2.7/dist-packages/grpc/beta/_client_adaptations.py", line 195, in _blocking_unary_unary
raise _abortion_error(rpc_error_call)
AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="input size does not match signature")
AbortionError: AbortionError(code=StatusCode.INVALID_ARGUMENT, details="input size does not match signature")
However, when I call the same endpoint like this
aws sagemaker-runtime invoke-endpoint --region 'us-east-2' --endpoint-name house-price-estimator-test-synthetic-keras --body "{\"inputs\": [ [0.5, 0.2]]}" --content-type "application/json" outputJson.json
the requests success and returns the expected result. This is why I am puzzled: if the input_fn
was being called, the second request should not succeed.
The error stack trace and error are pretty cryptic so I am not sure where to look at.
My input_fn
looks like this:
def clean_serialized_input(serialized_input):
"""
Saemaker request adds all sorts of odd balls in the serialized string from the request
This method tries to clean it up to be robust to that.
:param serialized_input:
:return:
"""
clean_input = serialized_input.replace("\\", "") # request adds \" to the string
if clean_input[0] == "\"":
offset_start = 1
else:
offset_start = 0
if clean_input[-1] == "\"":
return clean_input[offset_start:-1]
else:
return clean_input[offset_start:]
def input_fn(data, content_type):
if content_type == "application/json":
# Expected a Json string like {"int_living_sqft": 0.5, "int_beds": 0.2}
clean_input = clean_serialized_input(data)
# Request adds " in the string
obj = json.loads(clean_input)
# change suggested by @mvsusp
#return [[obj['int_living_sqft'], obj['int_beds']]]
return {INPUT_TENSOR_NAME: [[obj['int_living_sqft'], obj['int_beds']]]}
elif content_type == "text/csv":
clean_input = clean_serialized_input(data)
# Request adds " in the string
return [[float(i) for i in clean_input.split(',')]]
else:
raise ValueError(
'Endpoint is not prepared for contenty type {}. It only accepts application/json or text/csv'.format(
content_type))
Any help is appreciated. Thanks, @fernandoamat
Hi,
As mentioned above by my teammates the input_fn
/default_input_fn
logic is part of the container and does not depend on local/non-local mode.
Would it be possible to see how you deployed the endpoint in local and non-local mode cases? Did you use python SDK estimator both times?
What happens if you query the endpoint using python sdk?
Also would it be possible to see the tar file from s3://test-estimator/customcode/tensorflow_synthetic_test_house_price/house-price-estimator-tensorflow-test-ts1539692900/source/sourcedir.tar.gz with source code container was running.
closing due to inactivity. feel free to reopen if necessary.
Hi, I have setup a small regression example to test Sagemaker Tensorflow pipeline from beginning to end following the examples here.
Everything worked OK, except when I want to overwrite
input_fn
method in my module so I can parse/transform incoming data at prediction time. If I useinstance_type='local'
the code works and the customizedinput_fn
method is called during the estimator.predict call. However, when I deploy the model to an endpoint, and I try to query it with the following callaws sagemaker-runtime invoke-endpoint --region 'us-east-2' --endpoint-name house-price-estimator-test-synthetic-keras --body "{\"int_living_sqft\": 0.5, \"int_beds\": 0.2}" --content-type "application/json" --accept "application/json" outputJson.json
the custom method
input_fn
seems to not be called. If I call the endpoint like thisaws sagemaker-runtime invoke-endpoint --region 'us-east-2' --endpoint-name house-price-estimator-test-synthetic-keras --body "{\"inputs\": [ [0.5, 0.2]]}" --content-type "application/json" outputJson.json
the request succeeds, which is a clear indication it is using the default Json parser from here.
I include a Zip file to reproduce everything: it has the notebook used to train and deploy a simple toy regression model in a Sagemaker notebook. It also has the python file with the
model_fn
,input_fn
and other methods required by the python notebook.Any help would be appreciated as I am out of ideas on what can be going wrong and why
input_fn
is not being called by the endpoint when it receives a prediction request. Again theinstance_type='local'
seems to work fine.Thanks, Fernando
sagemaker_input_fn_issue.zip
This is the traceback from the CloudWatch Console: