aws / sagemaker-inference-toolkit

Serve machine learning models within a 🐳 Docker container using 🧠 Amazon SageMaker.
Apache License 2.0
385 stars 82 forks source link

fix: remove prohibited characters from error response #62

Closed ajaykarpur closed 4 years ago

ajaykarpur commented 4 years ago

Issue #, if available: When an inference error occurs, MMS logs the following message:

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n:

The root cause of this issue is actually from Netty, which is a dependency of MMS:

Description of changes: As a workaround, we remove the offending characters from the error message.

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

Tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

ajaykarpur commented 4 years ago

The space-separated stacktrace is unfortunately less readable, but the only other option is for Netty to fix the bug on their end. I'll open an issue with MMS to see if they can drive that fix.

ashishgupta023 commented 4 years ago

MMS Issue filed : https://github.com/awslabs/multi-model-server/issues/933

ashishgupta023 commented 4 years ago

do you have an example of what the sanitized error message looks like? (wondering about that space-separated stacktrace...)


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last):   File "code/__torch__/torch_neuron/convert.py", line 7, in forward     argument_1: Tensor) -> Tensor:     _0 = __torch__.torch_neuron.decorators.___torch_mangle_4.neuron_function     return _0(argument_1, )            ~~ <--- HERE   File "code/__torch__/torch_neuron/decorators/___torch_mangle_4.py", line 2, in neuron_function def neuron_function(argument_0: Tensor) -> Tensor:   _0 = ops.neuron.forward_1([argument_0], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2)        ~~~~~~~~~~~~~~~~~~~~ <--- HERE   return _0  Traceback of TorchScript, original code (most recent call last): /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(252): neuron_function /opt/amazon/lib/python3.6/site-packages/torch/jit/__init__.py(899): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(256): create_runnable /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(145): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(154): _convert_item /opt/amazon/lib/python3.6/site-packages/torch_neuron/graph.py(71): __call__ /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(139): compile_fused_operators /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(63): trace /opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py(60): compile_model /opt/amazon/bin/neo_main.py(51): compile_model /opt/amazon/bin/neo_main.py(96): compile /opt/amazon/bin/neo_main.py(124): <module> RuntimeError: NeuronDevice::process_nrtd_response: Context=NRTD_CTX_INFER_WAIT NRTD response code 'NERR_INFER_BAD_INPUT' (1002) Bad input was supplied on an infer call Raw runtime error message ''  Traceback (most recent call last):   File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 127, in transform     result = self._transform_fn(self._model, input_data, content_type, accept)   File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 217, in _default_transform_fn     prediction = self._predict_fn(data, model)   File "/usr/local/lib/python3.6/dist-packages/sagemaker_pytorch_inferentia_serving_container/default_inference_handler.py", line 63, in default_predict_fn     return model(data)   File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__     result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last):   File "code/__torch__/torch_neuron/convert.py", line 7, in forward     argument_1: Tensor) -> Tensor:     _0 = __torch__.torch_neuron.decorators.___torch_mangle_4.neuron_function     return _0(argument_1, )            ~~ <--- HERE   File "code/__torch__/torch_neuron/decorators/___torch_mangle_4.py", line 2, in neuron_function def neuron_function(argument_0: Tensor) -> Tensor:   _0 = ops.neuron.forward_1([argument_0], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2)        ~~~~~~~~~~~~~~~~~~~~ <--- HERE   return _0  Traceback of TorchScript, original code (most recent call last): /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(252): neuron_function /opt/amazon/lib/python3.6/site-packages/torch/jit/__init__.py(899): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(256): create_runnable /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(145): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(154): _convert_item /opt/amazon/lib/python3.6/site-packages/torch_neuron/graph.py(71): __call__ /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(139): compile_fused_operators /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(63): trace /opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py(60): compile_model /opt/amazon/bin/neo_main.py(51): compile_model /opt/amazon/bin/neo_main.py(96): compile /opt/amazon/bin/neo_main.py(124): <module> RuntimeError: NeuronDevice::process_nrtd_response: Context=NRTD_CTX_INFER_WAIT NRTD response code 'NERR_INFER_BAD_INPUT' (1002) Bad input was supplied on an infer call Raw runtime error message ''  ". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/inf1-inception-110-fail in account 886656810413 for more information. ```