Closed ajaykarpur closed 4 years ago
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository
The space-separated stacktrace is unfortunately less readable, but the only other option is for Netty to fix the bug on their end. I'll open an issue with MMS to see if they can drive that fix.
MMS Issue filed : https://github.com/awslabs/multi-model-server/issues/933
do you have an example of what the sanitized error message looks like? (wondering about that space-separated stacktrace...)
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__/torch_neuron/convert.py", line 7, in forward argument_1: Tensor) -> Tensor: _0 = __torch__.torch_neuron.decorators.___torch_mangle_4.neuron_function return _0(argument_1, ) ~~ <--- HERE File "code/__torch__/torch_neuron/decorators/___torch_mangle_4.py", line 2, in neuron_function def neuron_function(argument_0: Tensor) -> Tensor: _0 = ops.neuron.forward_1([argument_0], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2) ~~~~~~~~~~~~~~~~~~~~ <--- HERE return _0 Traceback of TorchScript, original code (most recent call last): /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(252): neuron_function /opt/amazon/lib/python3.6/site-packages/torch/jit/__init__.py(899): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(256): create_runnable /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(145): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(154): _convert_item /opt/amazon/lib/python3.6/site-packages/torch_neuron/graph.py(71): __call__ /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(139): compile_fused_operators /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(63): trace /opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py(60): compile_model /opt/amazon/bin/neo_main.py(51): compile_model /opt/amazon/bin/neo_main.py(96): compile /opt/amazon/bin/neo_main.py(124): <module> RuntimeError: NeuronDevice::process_nrtd_response: Context=NRTD_CTX_INFER_WAIT NRTD response code 'NERR_INFER_BAD_INPUT' (1002) Bad input was supplied on an infer call Raw runtime error message '' Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 127, in transform result = self._transform_fn(self._model, input_data, content_type, accept) File "/usr/local/lib/python3.6/dist-packages/sagemaker_inference/transformer.py", line 217, in _default_transform_fn prediction = self._predict_fn(data, model) File "/usr/local/lib/python3.6/dist-packages/sagemaker_pytorch_inferentia_serving_container/default_inference_handler.py", line 63, in default_predict_fn return model(data) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__/torch_neuron/convert.py", line 7, in forward argument_1: Tensor) -> Tensor: _0 = __torch__.torch_neuron.decorators.___torch_mangle_4.neuron_function return _0(argument_1, ) ~~ <--- HERE File "code/__torch__/torch_neuron/decorators/___torch_mangle_4.py", line 2, in neuron_function def neuron_function(argument_0: Tensor) -> Tensor: _0 = ops.neuron.forward_1([argument_0], CONSTANTS.c0, CONSTANTS.c1, CONSTANTS.c2) ~~~~~~~~~~~~~~~~~~~~ <--- HERE return _0 Traceback of TorchScript, original code (most recent call last): /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(252): neuron_function /opt/amazon/lib/python3.6/site-packages/torch/jit/__init__.py(899): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(256): create_runnable /opt/amazon/lib/python3.6/site-packages/torch_neuron/decorators.py(145): trace /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(154): _convert_item /opt/amazon/lib/python3.6/site-packages/torch_neuron/graph.py(71): __call__ /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(139): compile_fused_operators /opt/amazon/lib/python3.6/site-packages/torch_neuron/convert.py(63): trace /opt/amazon/lib/python3.6/site-packages/neo_inferentia_compiler/pytorch_framework.py(60): compile_model /opt/amazon/bin/neo_main.py(51): compile_model /opt/amazon/bin/neo_main.py(96): compile /opt/amazon/bin/neo_main.py(124): <module> RuntimeError: NeuronDevice::process_nrtd_response: Context=NRTD_CTX_INFER_WAIT NRTD response code 'NERR_INFER_BAD_INPUT' (1002) Bad input was supplied on an infer call Raw runtime error message '' ". See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/inf1-inception-110-fail in account 886656810413 for more information. ```
Issue #, if available: When an inference error occurs, MMS logs the following message:
The root cause of this issue is actually from Netty, which is a dependency of MMS:
Description of changes: As a workaround, we remove the offending characters from the error message.
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.