fix: remove prohibited characters from error response

ajaykarpur commented 4 years ago

Issue #, if available: When an inference error occurs, MMS logs the following message:

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n:

The root cause of this issue is actually from Netty, which is a dependency of MMS:

Description of changes: As a workaround, we remove the offending characters from the error message.

Testing done:

Added a unit test.
Manual testing with a PyTorch inference container deployed to an endpoint.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

[x] I have read the CONTRIBUTING doc
[x] I used the commit message format described in CONTRIBUTING
[ ] I have used the regional endpoint when creating S3 and/or STS clients (if appropriate)
[x] I have updated any necessary documentation, including READMEs

Tests

[x] I have added tests that prove my fix is effective or that my feature works (if appropriate)
[ ] I have checked that my tests are not configured for a specific region or account (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-inference-toolkit-pr
Commit ID: 2f11990b53df72e5fc04c8e1795572f09e98bfc8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-inference-toolkit-pr
Commit ID: dc71f862b8fb41f16e6c4ad418291a3e0216bbea
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-inference-toolkit-pr
Commit ID: 7ce7f2f39eb14ee7f03bab550ea761a4a0359214
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot commented 4 years ago

AWS CodeBuild CI Report

CodeBuild project: sagemaker-inference-toolkit-pr
Commit ID: 508db003e12f6c602c018878bcd0028febc1ee05
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

ajaykarpur commented 4 years ago

The space-separated stacktrace is unfortunately less readable, but the only other option is for Netty to fix the bug on their end. I'll open an issue with MMS to see if they can drive that fix.

ashishgupta023 commented 4 years ago

MMS Issue filed : https://github.com/awslabs/multi-model-server/issues/933

ashishgupta023 commented 4 years ago

do you have an example of what the sanitized error message looks like? (wondering about that space-separated stacktrace...)


ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last):   File "code/__torch__/torch_neuron/convert.py", line 7, in forward     argument_1: Tensor) -> Tensor:     _0 = __torch__.torch_neuron.decorators.___torch_mangle_4.neuron_function     return _0(argument_1, )            ~~ <--- HERE   File "code/__torch__/torch_neuron/decorators/___torch_mangle_4.py", line 2, in neuron_function def neuron_function(argument_0: Tensor) -> Tensor:   _0 = ops.neuron.forward_1([argument_0], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2)        ~~~~~~~~~~~~~~~~~~~~ <--- HERE   return _0  Traceback of TorchScript, original code (most recent call last): /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(252): neuron_function /opt/amazon/lib/python3.6/site-packages/torch/jit/__init__.py(899): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(256): create_runnable /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(145): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(154): _convert_item /opt/amazon/lib/python3.6/site-packages/torch_neuron/graph.py(71): __call__ /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(139): compile_fused_operators /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(63): trace /opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py(60): compile_model /opt/amazon/bin/neo_main.py(51): compile_model /opt/amazon/bin/neo_main.py(96): compile /opt/amazon/bin/neo_main.py(124): <module> RuntimeError: NeuronDevice::process_nrtd_response: Context=NRTD_CTX_INFER_WAIT NRTD response code 'NERR_INFER_BAD_INPUT' (1002) Bad input was supplied on an infer call Raw runtime error message ''  Traceback (most recent call last):   File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 127, in transform     result = self._transform_fn(self._model, input_data, content_type, accept)   File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 217, in _default_transform_fn     prediction = self._predict_fn(data, model)   File "/usr/local/lib/python3.6/dist-packages/sagemaker_pytorch_inferentia_serving_container/default_inference_handler.py", line 63, in default_predict_fn     return model(data)   File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__     result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last):   File "code/__torch__/torch_neuron/convert.py", line 7, in forward     argument_1: Tensor) -> Tensor:     _0 = __torch__.torch_neuron.decorators.___torch_mangle_4.neuron_function     return _0(argument_1, )            ~~ <--- HERE   File "code/__torch__/torch_neuron/decorators/___torch_mangle_4.py", line 2, in neuron_function def neuron_function(argument_0: Tensor) -> Tensor:   _0 = ops.neuron.forward_1([argument_0], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2)        ~~~~~~~~~~~~~~~~~~~~ <--- HERE   return _0  Traceback of TorchScript, original code (most recent call last): /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(252): neuron_function /opt/amazon/lib/python3.6/site-packages/torch/jit/__init__.py(899): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(256): create_runnable /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(145): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(154): _convert_item /opt/amazon/lib/python3.6/site-packages/torch_neuron/graph.py(71): __call__ /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(139): compile_fused_operators /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(63): trace /opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py(60): compile_model /opt/amazon/bin/neo_main.py(51): compile_model /opt/amazon/bin/neo_main.py(96): compile /opt/amazon/bin/neo_main.py(124): <module> RuntimeError: NeuronDevice::process_nrtd_response: Context=NRTD_CTX_INFER_WAIT NRTD response code 'NERR_INFER_BAD_INPUT' (1002) Bad input was supplied on an infer call Raw runtime error message ''  ". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/inf1-inception-110-fail in account 886656810413 for more information. ```

aws / sagemaker-inference-toolkit