aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
452 stars 152 forks source link

`libwalrus.so` undefined symbol: _ZNK3bir3Hwm21getInitiationIntervalERKNS_23InstCalcVarAddrSymbolicE #328

Closed sunbc0120 closed 3 years ago

sunbc0120 commented 3 years ago
  1. Setup the environment by following:

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/install-pytorch.html

Environment Python 3.7.10:

torch==1.8.1
torch-model-archiver==0.3.0
torch-neuron==1.8.1.1.5.21.0
torchserve==0.3.0
  1. Then followed the tutorial:

https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-pytorch-neuron.html

Got error:

``` INFO:Neuron:The following operations are currently supported in torch-neuron for this model: INFO:Neuron:aten::max_pool2d INFO:Neuron:aten::add INFO:Neuron:aten::addmm INFO:Neuron:aten::batch_norm INFO:Neuron:aten::flatten INFO:Neuron:aten::t INFO:Neuron:aten::adaptive_avg_pool2d INFO:Neuron:aten::relu INFO:Neuron:prim::Constant INFO:Neuron:aten::_convolution INFO:Neuron:prim::ListConstruct INFO:Neuron:100.00% of all operations (including primitives) (1698 of 1698) are supported INFO:Neuron:100.00% of arithmetic operations (176 of 176) are supported INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 176, fused = 176, percent fused = 100.0% WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$1110; falling back to native python function call ERROR:Neuron:google/protobuf/pyext/descriptor.cc:358: bad argument to internal function Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 346, in op_converter item, inputs, compiler_workdir=sg_workdir, **kwargs) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/decorators.py", line 61, in trace import tensorflow as tf File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 99, in from tensorflow_core import * File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/__init__.py", line 34, in from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import File "", line 1019, in _handle_fromlist File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/__init__.py", line 52, in from tensorflow.core.framework.graph_pb2 import * File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/graph_pb2.py", line 17, in from tensorflow.core.framework import function_pb2 as tensorflow_dot_core_dot_framework_dot_function__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/function_pb2.py", line 463, in '__module__' : 'tensorflow.core.framework.function_pb2' SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function INFO:Neuron:Number of arithmetic operators (post-compilation) before = 176, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::_convolution: 53 [supported] INFO:Neuron: => aten::adaptive_avg_pool2d: 1 [supported] INFO:Neuron: => aten::add: 16 [supported] INFO:Neuron: => aten::addmm: 1 [supported] INFO:Neuron: => aten::batch_norm: 53 [supported] INFO:Neuron: => aten::flatten: 1 [supported] INFO:Neuron: => aten::max_pool2d: 1 [supported] INFO:Neuron: => aten::relu: 49 [supported] INFO:Neuron: => aten::t: 1 [supported] ERROR:root:Internal Python error in the inspect module. Below is the traceback from this internal error. ERROR:root:Internal Python error in the inspect module. Below is the traceback from this internal error. ERROR:root:Internal Python error in the inspect module. Below is the traceback from this internal error. Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 28, in model_neuron = torch.neuron.trace(model, example_inputs=[image]) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 124, in trace cu.stats_post_compiler(neuron_graph) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 457, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2061, in showtraceback stb = value._render_traceback_() AttributeError: 'RuntimeError' object has no attribute '_render_traceback_' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1101, in get_records return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 248, in wrapped return f(*args, **kwargs) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 281, in _fixed_getinnerframes records = fix_frame_records_filenames(inspect.getinnerframes(etb, context)) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 1502, in getinnerframes frameinfo = (tb.tb_frame,) + getframeinfo(tb, context) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 1460, in getframeinfo filename = getsourcefile(frame) or getfile(frame) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 696, in getsourcefile if getattr(getmodule(object, filename), '__loader__', None) is not None: File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 733, in getmodule if ismodule(module) and hasattr(module, '__file__'): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 953, in _find_and_load_unlocked File "", line 219, in _call_with_frames_removed File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/__init__.py", line 39, in from tensorflow._api.v1 import audio File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/_api/v1/audio/__init__.py", line 10, in from tensorflow.python.ops.gen_audio_ops import decode_wav File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_audio_ops.py", line 11, in from tensorflow.python.eager import context as _context File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py", line 29, in from tensorflow.core.protobuf import config_pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/protobuf/config_pb2.py", line 17, in from tensorflow.core.framework import graph_pb2 as tensorflow_dot_core_dot_framework_dot_graph__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/graph_pb2.py", line 17, in from tensorflow.core.framework import function_pb2 as tensorflow_dot_core_dot_framework_dot_function__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/function_pb2.py", line 463, in '__module__' : 'tensorflow.core.framework.function_pb2' SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 28, in model_neuron = torch.neuron.trace(model, example_inputs=[image]) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 124, in trace cu.stats_post_compiler(neuron_graph) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 457, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2061, in showtraceback stb = value._render_traceback_() AttributeError: 'RuntimeError' object has no attribute '_render_traceback_' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes if (await self.run_code(code, result, async_=asy)): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3454, in run_code self.showtraceback(running_compiled_code=True) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2064, in showtraceback value, tb, tb_offset=tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1368, in structured_traceback self, etype, value, tb, tb_offset, number_of_lines_of_context) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1268, in structured_traceback self, etype, value, tb, tb_offset, number_of_lines_of_context File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1125, in structured_traceback tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1082, in format_exception_as_a_whole last_unique, recursion_repeat = find_recursion(orig_etype, evalue, records) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 382, in find_recursion return len(records), 0 TypeError: object of type 'NoneType' has no len() During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2061, in showtraceback stb = value._render_traceback_() AttributeError: 'TypeError' object has no attribute '_render_traceback_' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1101, in get_records return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 248, in wrapped return f(*args, **kwargs) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 281, in _fixed_getinnerframes records = fix_frame_records_filenames(inspect.getinnerframes(etb, context)) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 1502, in getinnerframes frameinfo = (tb.tb_frame,) + getframeinfo(tb, context) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 1460, in getframeinfo filename = getsourcefile(frame) or getfile(frame) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 696, in getsourcefile if getattr(getmodule(object, filename), '__loader__', None) is not None: File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 733, in getmodule if ismodule(module) and hasattr(module, '__file__'): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 953, in _find_and_load_unlocked File "", line 219, in _call_with_frames_removed File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/__init__.py", line 39, in from tensorflow._api.v1 import audio File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/_api/v1/audio/__init__.py", line 10, in from tensorflow.python.ops.gen_audio_ops import decode_wav File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_audio_ops.py", line 11, in from tensorflow.python.eager import context as _context File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py", line 29, in from tensorflow.core.protobuf import config_pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/protobuf/config_pb2.py", line 17, in from tensorflow.core.framework import graph_pb2 as tensorflow_dot_core_dot_framework_dot_graph__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/graph_pb2.py", line 17, in from tensorflow.core.framework import function_pb2 as tensorflow_dot_core_dot_framework_dot_function__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/function_pb2.py", line 463, in '__module__' : 'tensorflow.core.framework.function_pb2' SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code exec(code_obj, self.user_global_ns, self.user_ns) File "", line 28, in model_neuron = torch.neuron.trace(model, example_inputs=[image]) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 124, in trace cu.stats_post_compiler(neuron_graph) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 457, in stats_post_compiler "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2061, in showtraceback stb = value._render_traceback_() AttributeError: 'RuntimeError' object has no attribute '_render_traceback_' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3357, in run_ast_nodes if (await self.run_code(code, result, async_=asy)): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3454, in run_code self.showtraceback(running_compiled_code=True) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2064, in showtraceback value, tb, tb_offset=tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1368, in structured_traceback self, etype, value, tb, tb_offset, number_of_lines_of_context) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1268, in structured_traceback self, etype, value, tb, tb_offset, number_of_lines_of_context File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1125, in structured_traceback tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1082, in format_exception_as_a_whole last_unique, recursion_repeat = find_recursion(orig_etype, evalue, records) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 382, in find_recursion return len(records), 0 TypeError: object of type 'NoneType' has no len() During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2061, in showtraceback stb = value._render_traceback_() AttributeError: 'TypeError' object has no attribute '_render_traceback_' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell return runner(coro) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/async_helpers.py", line 68, in _pseudo_sync_runner coro.send(None) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3166, in run_cell_async interactivity=interactivity, compiler=compiler, result=result) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3376, in run_ast_nodes self.showtraceback() File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2064, in showtraceback value, tb, tb_offset=tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1368, in structured_traceback self, etype, value, tb, tb_offset, number_of_lines_of_context) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1268, in structured_traceback self, etype, value, tb, tb_offset, number_of_lines_of_context File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1143, in structured_traceback chained_exceptions_tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1082, in format_exception_as_a_whole last_unique, recursion_repeat = find_recursion(orig_etype, evalue, records) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 382, in find_recursion return len(records), 0 TypeError: object of type 'NoneType' has no len() During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 2061, in showtraceback stb = value._render_traceback_() AttributeError: 'TypeError' object has no attribute '_render_traceback_' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 1101, in get_records return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 248, in wrapped return f(*args, **kwargs) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/IPython/core/ultratb.py", line 281, in _fixed_getinnerframes records = fix_frame_records_filenames(inspect.getinnerframes(etb, context)) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 1502, in getinnerframes frameinfo = (tb.tb_frame,) + getframeinfo(tb, context) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 1460, in getframeinfo filename = getsourcefile(frame) or getfile(frame) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 696, in getsourcefile if getattr(getmodule(object, filename), '__loader__', None) is not None: File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/inspect.py", line 733, in getmodule if ismodule(module) and hasattr(module, '__file__'): File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 50, in __getattr__ module = self._load() File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow/__init__.py", line 44, in _load module = _importlib.import_module(self.__name__) File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 953, in _find_and_load_unlocked File "", line 219, in _call_with_frames_removed File "", line 1006, in _gcd_import File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 677, in _load_unlocked File "", line 728, in exec_module File "", line 219, in _call_with_frames_removed File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/__init__.py", line 39, in from tensorflow._api.v1 import audio File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/_api/v1/audio/__init__.py", line 10, in from tensorflow.python.ops.gen_audio_ops import decode_wav File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_audio_ops.py", line 11, in from tensorflow.python.eager import context as _context File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/python/eager/context.py", line 29, in from tensorflow.core.protobuf import config_pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/protobuf/config_pb2.py", line 17, in from tensorflow.core.framework import graph_pb2 as tensorflow_dot_core_dot_framework_dot_graph__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/graph_pb2.py", line 17, in from tensorflow.core.framework import function_pb2 as tensorflow_dot_core_dot_framework_dot_function__pb2 File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/tensorflow_core/core/framework/function_pb2.py", line 463, in '__module__' : 'tensorflow.core.framework.function_pb2' SystemError: google/protobuf/pyext/descriptor.cc:358: bad argument to internal function ```
  1. Following the suggestion:

https://github.com/tensorflow/tensorflow/issues/48797#issuecomment-892706478

Got the following error:

INFO:Neuron:The following operations are currently supported in torch-neuron for this model:
INFO:Neuron:aten::t
INFO:Neuron:aten::relu
INFO:Neuron:prim::ListConstruct
INFO:Neuron:aten::flatten
INFO:Neuron:aten::adaptive_avg_pool2d
INFO:Neuron:aten::_convolution
INFO:Neuron:aten::addmm
INFO:Neuron:prim::Constant
INFO:Neuron:aten::max_pool2d
INFO:Neuron:aten::add
INFO:Neuron:aten::batch_norm
INFO:Neuron:100.00% of all operations (including primitives) (1698 of 1698) are supported
INFO:Neuron:100.00% of arithmetic operations (176 of 176) are supported
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 176, fused = 176, percent fused = 100.0%
INFO:Neuron:Compiling function _NeuronGraph$1110 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/miniconda3/envs/inf/bin/neuron-cc compile /tmp/tmpxrsipyzj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpxrsipyzj/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]} --verbose 35'
INFO:Neuron:Compile command returned: 1
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$1110; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ubuntu/miniconda3/envs/inf/bin/neuron-cc compile /tmp/tmpxrsipyzj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpxrsipyzj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py", line 346, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/decorators.py", line 196, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ubuntu/miniconda3/envs/inf/bin/neuron-cc compile /tmp/tmpxrsipyzj/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpxrsipyzj/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 176, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 53 [supported]
INFO:Neuron: => aten::adaptive_avg_pool2d: 1 [supported]
INFO:Neuron: => aten::add: 16 [supported]
INFO:Neuron: => aten::addmm: 1 [supported]
INFO:Neuron: => aten::batch_norm: 53 [supported]
INFO:Neuron: => aten::flatten: 1 [supported]
INFO:Neuron: => aten::max_pool2d: 1 [supported]
INFO:Neuron: => aten::relu: 49 [supported]
INFO:Neuron: => aten::t: 1 [supported]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-1b8b64cdb931> in <module>
     26 ## Note: The "-O2" setting is default in recent releases, but may be needed for DLAMI
     27 ##       and older installed environments- model_neuron = torch.neuron.trace(model, example_inputs=[image], compiler_args="-O2")
---> 28 model_neuron = torch.neuron.trace(model, example_inputs=[image])
     29 
     30 # The output of this step will have the percentage of operations compiled, example:

~/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, verbose, **kwargs)
    122     with skip_inference_context():
    123         neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 124     cu.stats_post_compiler(neuron_graph)
    125 
    126     # Wrap the compiled version of the model in a script module. Note that this is

~/miniconda3/envs/inf/lib/python3.7/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph)
    455         if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
    456             raise RuntimeError(
--> 457                 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    458 
    459         if percent_operations_compiled < 50.0:

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
  1. Manually run the command /home/ubuntu/miniconda3/envs/inf/bin/neuron-cc compile /tmp/tmp0ypcs0gm/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp0ypcs0gm/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35, results into the following error:
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/inf/bin/neuron-cc", line 10, in <module>
    sys.exit(main())
  File "neuroncc/driver/CommandDriver.py", line 224, in neuroncc.driver.CommandDriver.main
  File "neuroncc/driver/CommandDriver.py", line 203, in neuroncc.driver.CommandDriver.CommandDriver.run
  File "neuroncc/driver/commands/CompileCommand.py", line 64, in neuroncc.driver.commands.CompileCommand.CompileCommand.__init__
  File "neuroncc/driver/JobRegistry.py", line 25, in neuroncc.driver.JobRegistry.JobRegistry.__init__
  File "/home/ubuntu/miniconda3/envs/inf/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 670, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 583, in module_from_spec
  File "<frozen importlib._bootstrap_external>", line 1043, in create_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
ImportError: /home/ubuntu/miniconda3/envs/inf/lib/python3.7/site-packages/neuroncc/driver/jobs/../../starfish/lib/libwalrus.so: undefined symbol: _ZNK3bir3Hwm21getInitiationIntervalERKNS_23InstCalcVarAddrSymbolicE
mrnikwaws commented 3 years ago

Hi Baichuan,

When using an Ubuntu18 installation as per the instructions you need to do some additional steps to makes things work (since Python 3.7 is not the default Python):

sudo apt install python3.7-dev
sudo apt install python3.7-venv

Once you have activated the environment use:

pip install -U pip
pip install neuron-cc[tensorflow]
pip install torch-neuron

At this point I was able to reproduce your error. It seems that this installation instructions install an incompatible version of numpy. To fix the issue please use the following:

pip install numpy==1.18.5

I'm assuming here that you used the created environment, rather than the preconfigured conda environment in the comments of the tutorial.

Please respond here and let us know if this does not correct your issue . We'll look at the install documentation and wheel requirements to prevent this problem in future.

sunbc0120 commented 3 years ago

Hi @mrnikwaws , thanks for your following up.

  1. Yes, I'm using a self-created and managed environment with conda (due to reasons to use Neuron, TorchServe and another open-source Deep Learning Frameworks and attempt to have their dependencies and version requirements be happy with each other)

  2. I added your suggestion on numpy and it's making some progress. Now the new error is:

INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 176, fused = 176, percent fused = 100.0%
INFO:Neuron:Compiling function _NeuronGraph$556 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]} --verbose 0'
09/22/2021 03:01:34 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:34 AM ERROR [neuron-cc]:  An Internal Compiler Error has occurred
09/22/2021 03:01:34 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Error message:  A process in the process pool was terminated abruptly while the future was running or pending.
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Error class:    BrokenProcessPool
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Error location: pipeline.compile.0
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Command line:   /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Internal details:
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 244, in neuroncc.driver.Job.runSingleInputFn
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 286, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 291, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 435, in result
09/22/2021 03:01:34 AM ERROR [neuron-cc]:     return self.__get_result()
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
09/22/2021 03:01:34 AM ERROR [neuron-cc]:     raise self._exception
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Version information:
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   Neuron Compiler version 1.6.13.0+9f61b2f75
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   HWM version 1.6.0.0-0
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   NEFF version Dynamic
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   TVM version 1.6.2.0+0
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   NumPy version 1.18.5
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   MXNet not available
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   TF not available
09/22/2021 03:01:34 AM ERROR [neuron-cc]:   ONNX not available
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Artifacts stored in: /tmp/tmp9vpkz85w
INFO:Neuron:Compile command returned: 1
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$556; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/convert.py", line 345, in op_converter
    neuron_function = self.subgraph_compiler(
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/decorators.py", line 195, in trace
    raise subprocess.SubprocessError(
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 176, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 53 [supported]
INFO:Neuron: => aten::adaptive_avg_pool2d: 1 [supported]
INFO:Neuron: => aten::add: 16 [supported]
INFO:Neuron: => aten::addmm: 1 [supported]
INFO:Neuron: => aten::batch_norm: 53 [supported]
INFO:Neuron: => aten::flatten: 1 [supported]
INFO:Neuron: => aten::max_pool2d: 1 [supported]
INFO:Neuron: => aten::relu: 49 [supported]
INFO:Neuron: => aten::t: 1 [supported]
Traceback (most recent call last):
  File "infer.py", line 14, in <module>
    model_neuron = torch.neuron.trace(model, example_inputs=[image])
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/convert.py", line 124, in trace
    cu.stats_post_compiler(neuron_graph)
  File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/convert.py", line 456, in stats_post_compiler
    raise RuntimeError(
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

Manually run the command /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0, results into the following error:

09/22/2021 03:01:59 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:59 AM ERROR [neuron-cc]:  An Internal Compiler Error has occurred
09/22/2021 03:01:59 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Error message:  A process in the process pool was terminated abruptly while the future was running or pending.
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Error class:    BrokenProcessPool
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Error location: pipeline.compile.0
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Command line:   /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Internal details:
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 244, in neuroncc.driver.Job.runSingleInputFn
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 286, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 291, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 435, in result
09/22/2021 03:01:59 AM ERROR [neuron-cc]:     return self.__get_result()
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
09/22/2021 03:01:59 AM ERROR [neuron-cc]:     raise self._exception
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Version information:
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   Neuron Compiler version 1.6.13.0+9f61b2f75
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   HWM version 1.6.0.0-0
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   NEFF version Dynamic
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   TVM version 1.6.2.0+0
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   NumPy version 1.18.5
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   MXNet not available
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   TF not available
09/22/2021 03:01:59 AM ERROR [neuron-cc]:   ONNX not available
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu/pythonProject/siamese_inf/notebook
mrnikwaws commented 3 years ago

Here is my output for my environment:

ubuntu@ip-172-31-34-46:~$ source test_env/bin/activate
(test_env) ubuntu@ip-172-31-34-46:~$ neuron-cc --version
Neuron Compiler version 1.6.13.0+9f61b2f75

HWM version 1.6.0.0-0
NEFF version Dynamic
TVM version 1.6.2.0+0
NumPy version 1.18.5
MXNet not available
TF not available
ONNX not available
(test_env) ubuntu@ip-172-31-34-46:~$ pip list
Package              Version
-------------------- ---------------------------
absl-py              0.13.0
astor                0.8.1
attrs                21.2.0
cached-property      1.5.2
cffi                 1.14.6
decorator            5.1.0
dmlc-nnvm            1.6.2.0+0
dmlc-topi            1.6.2.0+0
dmlc-tvm             1.6.2.0+0
gast                 0.2.2
google-pasta         0.2.0
grpcio               1.40.0
h5py                 2.10.0
importlib-metadata   4.8.1
inferentia-hwm       1.6.0.0+0
islpy                2018.2+aws2018.x.853.0.bld0
Keras-Applications   1.0.8
Keras-Preprocessing  1.1.2
Markdown             3.3.4
networkx             2.4
neuron-cc            1.6.13.0+9f61b2f75
numpy                1.18.5
opt-einsum           3.3.0
Pillow               8.3.2
pip                  21.2.4
pkg_resources        0.0.0
protobuf             3.18.0
pycparser            2.20
scipy                1.4.1
setuptools           58.0.4
six                  1.16.0
tensorboard          1.15.0
tensorflow           1.15.5
tensorflow-estimator 1.15.1
termcolor            1.1.0
torch                1.8.1
torch-neuron         1.8.1.1.5.21.0
torchvision          0.9.1
typing-extensions    3.10.0.2
Werkzeug             2.0.1
wheel                0.37.0
wrapt                1.12.1
zipp                 3.5.0

As you can see I am using the same version of the compiler as you, so I suspect your python environment since I can compile. Can you please share the output of apt list | grep aws-neuron, conda list and pip list? I would like to confirm that your conda environment is healthy. Sometimes version conflicts can occur between conda and pip. If possible as a fallback I recommend creating and testing with a pip virtual environment.

sunbc0120 commented 3 years ago

apt list | grep aws-neuron:

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

aws-neuron-dkms/unknown 2.1.5.0 amd64 [upgradable from: 2.0.450.0]
aws-neuron-k8-plugin/unknown 1.6.22.0 amd64
aws-neuron-k8-scheduler/unknown 1.6.22.0 amd64
aws-neuron-runtime/unknown 1.6.24.0 amd64 [upgradable from: 1.5.0.0]
aws-neuron-runtime-base/unknown 1.6.21.0 amd64 [upgradable from: 1.6.16.0]
aws-neuron-tools/unknown 1.7.25.0 amd64 [upgradable from: 1.6.1.0]

conda list:

# packages in environment at /home/ubuntu/miniconda3/envs/inf_debug:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
absl-py                   0.14.0                   pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
attrs                     21.2.0                   pypi_0    pypi
ca-certificates           2021.5.30            ha878542_0    conda-forge
cached-property           1.5.2                    pypi_0    pypi
certifi                   2021.5.30        py37h89c1867_0    conda-forge
cffi                      1.14.6                   pypi_0    pypi
decorator                 5.1.0                    pypi_0    pypi
dmlc-nnvm                 1.6.2.0+0                pypi_0    pypi
dmlc-topi                 1.6.2.0+0                pypi_0    pypi
dmlc-tvm                  1.6.2.0+0                pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.40.0                   pypi_0    pypi
h5py                      3.4.0                    pypi_0    pypi
importlib-metadata        4.8.1                    pypi_0    pypi
inferentia-hwm            1.6.0.0+0                pypi_0    pypi
islpy                     2018.2+aws2018.x.853.0.bld0          pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 11.2.0               h1d223b6_8    conda-forge
libgomp                   11.2.0               h1d223b6_8    conda-forge
libstdcxx-ng              11.2.0               he4da1e4_8    conda-forge
markdown                  3.3.4                    pypi_0    pypi
ncurses                   6.2                  h58526e2_4    conda-forge
networkx                  2.4                      pypi_0    pypi
neuron-cc                 1.6.13.0+9f61b2f75          pypi_0    pypi
numpy                     1.18.5                   pypi_0    pypi
openssl                   1.1.1l               h7f98852_0    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
pip                       21.2.4                   pypi_0    pypi
protobuf                  3.18.0                   pypi_0    pypi
pycparser                 2.20                     pypi_0    pypi
python                    3.7.11               h12debd9_0
python_abi                3.7                     2_cp37m    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
scipy                     1.4.1                    pypi_0    pypi
setuptools                58.0.4                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.36.0               h9cd32fc_1    conda-forge
tensorboard               1.15.0                   pypi_0    pypi
tensorflow                1.15.0                   pypi_0    pypi
tensorflow-estimator      1.15.1                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.11               h27826a3_1    conda-forge
torch                     1.8.1                    pypi_0    pypi
torch-neuron              1.8.1.1.5.21.0           pypi_0    pypi
typing-extensions         3.10.0.2                 pypi_0    pypi
werkzeug                  2.0.1                    pypi_0    pypi
wheel                     0.37.0                   pypi_0    pypi
wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h516909a_1    conda-forge
zipp                      3.5.0                    pypi_0    pypi
zlib                      1.2.11            h516909a_1010    conda-forge

pip list:

Package              Version
-------------------- ---------------------------
absl-py              0.14.0
astor                0.8.1
attrs                21.2.0
cached-property      1.5.2
certifi              2021.5.30
cffi                 1.14.6
decorator            5.1.0
dmlc-nnvm            1.6.2.0+0
dmlc-topi            1.6.2.0+0
dmlc-tvm             1.6.2.0+0
gast                 0.2.2
google-pasta         0.2.0
grpcio               1.40.0
h5py                 3.4.0
importlib-metadata   4.8.1
inferentia-hwm       1.6.0.0+0
islpy                2018.2+aws2018.x.853.0.bld0
Keras-Applications   1.0.8
Keras-Preprocessing  1.1.2
Markdown             3.3.4
networkx             2.4
neuron-cc            1.6.13.0+9f61b2f75
numpy                1.18.5
opt-einsum           3.3.0
pip                  21.2.4
protobuf             3.18.0
pycparser            2.20
scipy                1.4.1
setuptools           58.0.4
six                  1.16.0
tensorboard          1.15.0
tensorflow           1.15.0
tensorflow-estimator 1.15.1
termcolor            1.1.0
torch                1.8.1
torch-neuron         1.8.1.1.5.21.0
typing-extensions    3.10.0.2
Werkzeug             2.0.1
wheel                0.37.0
wrapt                1.12.1
zipp                 3.5.0
mrnikwaws commented 3 years ago

I don't see torchvision (where the resnet50 model is pulled from) in you environment - so an unexpected version may be being inherited.

Please try:

pip install torchvision==0.9.1

and see if that resolves your issue

sunbc0120 commented 3 years ago

pip list:

Package              Version
-------------------- ---------------------------
absl-py              0.14.0
astor                0.8.1
attrs                21.2.0
cached-property      1.5.2
certifi              2021.5.30
cffi                 1.14.6
decorator            5.1.0
dmlc-nnvm            1.6.2.0+0
dmlc-topi            1.6.2.0+0
dmlc-tvm             1.6.2.0+0
gast                 0.2.2
google-pasta         0.2.0
grpcio               1.40.0
h5py                 3.4.0
importlib-metadata   4.8.1
inferentia-hwm       1.6.0.0+0
islpy                2018.2+aws2018.x.853.0.bld0
Keras-Applications   1.0.8
Keras-Preprocessing  1.1.2
Markdown             3.3.4
networkx             2.4
neuron-cc            1.6.13.0+9f61b2f75
numpy                1.18.5
opt-einsum           3.3.0
Pillow               8.3.2
pip                  21.2.4
protobuf             3.18.0
pycparser            2.20
scipy                1.4.1
setuptools           58.0.4
six                  1.16.0
tensorboard          1.15.0
tensorflow           1.15.0
tensorflow-estimator 1.15.1
termcolor            1.1.0
torch                1.8.1
torch-neuron         1.8.1.1.5.21.0
torchvision          0.9.1
typing-extensions    3.10.0.2
Werkzeug             2.0.1
wheel                0.37.0
wrapt                1.12.1
zipp                 3.5.0

conda list

# packages in environment at /home/ubuntu/miniconda3/envs/inf_debug:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
absl-py                   0.14.0                   pypi_0    pypi
astor                     0.8.1                    pypi_0    pypi
attrs                     21.2.0                   pypi_0    pypi
ca-certificates           2021.5.30            ha878542_0    conda-forge
cached-property           1.5.2                    pypi_0    pypi
certifi                   2021.5.30        py37h89c1867_0    conda-forge
cffi                      1.14.6                   pypi_0    pypi
decorator                 5.1.0                    pypi_0    pypi
dmlc-nnvm                 1.6.2.0+0                pypi_0    pypi
dmlc-topi                 1.6.2.0+0                pypi_0    pypi
dmlc-tvm                  1.6.2.0+0                pypi_0    pypi
gast                      0.2.2                    pypi_0    pypi
google-pasta              0.2.0                    pypi_0    pypi
grpcio                    1.40.0                   pypi_0    pypi
h5py                      3.4.0                    pypi_0    pypi
importlib-metadata        4.8.1                    pypi_0    pypi
inferentia-hwm            1.6.0.0+0                pypi_0    pypi
islpy                     2018.2+aws2018.x.853.0.bld0          pypi_0    pypi
keras-applications        1.0.8                    pypi_0    pypi
keras-preprocessing       1.1.2                    pypi_0    pypi
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
libffi                    3.3                  h58526e2_2    conda-forge
libgcc-ng                 11.2.0               h1d223b6_8    conda-forge
libgomp                   11.2.0               h1d223b6_8    conda-forge
libstdcxx-ng              11.2.0               he4da1e4_8    conda-forge
markdown                  3.3.4                    pypi_0    pypi
ncurses                   6.2                  h58526e2_4    conda-forge
networkx                  2.4                      pypi_0    pypi
neuron-cc                 1.6.13.0+9f61b2f75          pypi_0    pypi
numpy                     1.18.5                   pypi_0    pypi
openssl                   1.1.1l               h7f98852_0    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
pillow                    8.3.2                    pypi_0    pypi
pip                       21.2.4                   pypi_0    pypi
protobuf                  3.18.0                   pypi_0    pypi
pycparser                 2.20                     pypi_0    pypi
python                    3.7.11               h12debd9_0
python_abi                3.7                     2_cp37m    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
scipy                     1.4.1                    pypi_0    pypi
setuptools                58.0.4                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sqlite                    3.36.0               h9cd32fc_1    conda-forge
tensorboard               1.15.0                   pypi_0    pypi
tensorflow                1.15.0                   pypi_0    pypi
tensorflow-estimator      1.15.1                   pypi_0    pypi
termcolor                 1.1.0                    pypi_0    pypi
tk                        8.6.11               h27826a3_1    conda-forge
torch                     1.8.1                    pypi_0    pypi
torch-neuron              1.8.1.1.5.21.0           pypi_0    pypi
torchvision               0.9.1                    pypi_0    pypi
typing-extensions         3.10.0.2                 pypi_0    pypi
werkzeug                  2.0.1                    pypi_0    pypi
wheel                     0.37.0                   pypi_0    pypi
wrapt                     1.12.1                   pypi_0    pypi
xz                        5.2.5                h516909a_1    conda-forge
zipp                      3.5.0                    pypi_0    pypi
zlib                      1.2.11            h516909a_1010    conda-forge

/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmpwrbt_d7n/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpwrbt_d7n/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35:

.09/25/2021 12:02:20 AM ERROR [neuron-cc]: ***************************************************************
09/25/2021 12:02:20 AM ERROR [neuron-cc]:  An Internal Compiler Error has occurred
09/25/2021 12:02:20 AM ERROR [neuron-cc]: ***************************************************************
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Error message:  A process in the process pool was terminated abruptly while the future was running or pending.
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Error class:    BrokenProcessPool
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Error location: pipeline.compile.0
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Command line:   /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmpwrbt_d7n/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpwrbt_d7n/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Internal details:
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 244, in neuroncc.driver.Job.runSingleInputFn
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 286, in neuroncc.driver.Job.SingleInputJob.run
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   File "neuroncc/driver/Job.py", line 291, in neuroncc.driver.Job.SingleInputJob.run
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 435, in result
09/25/2021 12:02:20 AM ERROR [neuron-cc]:     return self.__get_result()
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
09/25/2021 12:02:20 AM ERROR [neuron-cc]:     raise self._exception
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Version information:
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   Neuron Compiler version 1.6.13.0+9f61b2f75
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   HWM version 1.6.0.0-0
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   NEFF version Dynamic
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   TVM version 1.6.2.0+0
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   NumPy version 1.18.5
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   MXNet not available
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   TF not available
09/25/2021 12:02:20 AM ERROR [neuron-cc]:   ONNX not available
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu/pythonProject/siamese_inf/notebook

Compiler status ERROR
sunbc0120 commented 3 years ago

FYI, I downgraded:

torch-neuron to 1.7.1
pytorch to 1.7.1
torchvision to 0.8.2

Now it works.

aws-taylor commented 3 years ago

Hello @sunbc0120,

It looks like the immediate problem has been resolved. If you are able to share your model then I'd like to investigate and improve our error message for this situation.

Regards, Taylor

mrnikwaws commented 3 years ago

+1 if you can share the model that would be helpful. Strangely this compiled for me with the same configuration (though not using a conda environment) using torch==1.8.1 and torchvision==0.9.1. If we can discover the discrepancy that may help other torch-neuron users

sunbc0120 commented 3 years ago

Thanks very much,

  1. The model is generated following the tutorial here: https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-pytorch-neuron.html
from torchvision import models

## Load a pretrained ResNet50 model
model = models.resnet50(pretrained=True)

## Tell the model we are using it for evaluation (not training)
model.eval()

(Tried to attached the model here but GitHub is having issue with *.zip: https://github.com/github/hub/issues/1479)

  1. Followed the suggestion by @mrnikwaws and tested pip virtual environment, it works without any error!

  2. Therefore I guess the issue is around conda when upgrading torch-neuron from 1.7.1 to 1.8.1. Totally understand neuron won't support conda anymore but the reality is some other packages are better managed in conda channels and some existing user-cases are already locked in to conda. Maybe one solution is to decouple the development (use e.g. conda) and neuron deployment (pure pip) environment?

https://aws.amazon.com/blogs/developer/neuron-conda-packages-eol/

sunbc0120 commented 3 years ago

Here is a sample script if you'd like to reproduce the error:

# new python environment
conda update --force conda
conda create -n debug python=3.7 -y
conda activate debug
conda install -c conda-forge gh -y

# fix: downgrade pytorch
# conda install pytorch==1.7.1 torchvision==0.8.2 -c pytorch

# pytorch neuron sdk
# fix:
# pip install "torch-neuron==1.7.*"
pip install torch-neuron
pip install neuron-cc[tensorflow]
pip install torchvision==0.9.1

# torchserve
# pip install torchserve==0.3.0 torch-model-archiver==0.3.0

# verify
which python
python -c "import torch.neuron"

cat << EOF > test.py && python test.py
import torch
import numpy as np
import os
import torch_neuron
from torchvision import models
import logging

## Enable logging so we can see any important warnings
logger = logging.getLogger('Neuron')
logger.setLevel(logging.INFO)

image = torch.zeros([1, 3, 224, 224], dtype=torch.float32)

## Load a pretrained ResNet50 model
model = models.resnet50(pretrained=True)

## Tell the model we are using it for evaluation (not training)
model.eval()

## Analyze the model - this will show operator support and operator count
torch.neuron.analyze_model( model, example_inputs=[image] )

## Now compile the model - with logging set to "info" we will see
## what compiles for Neuron, and if there are any fallbacks
## Note: The "-O2" setting is default in recent releases, but may be needed for DLAMI
##       and older installed environments- model_neuron = torch.neuron.trace(model, example_inputs=[image], compiler_args="-O2")
model_neuron = torch.neuron.trace(model, example_inputs=[image])

# The output of this step will have the percentage of operations compiled, example:
#
# INFO:Neuron:The neuron partitioner created 1 sub-graphs
# INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%

## Export to saved model
model_neuron.save("resnet50_neuron.pt")
print("Compile Args, input tensor: {}, data type:'fp32', 'core': 1 ")
print("Compile success")

EOF
aws-diamant commented 3 years ago

Thanks @sunbc0120 for sharing the script. We're working on re-creating the issue, and will update soon.

awsilya commented 3 years ago

Hello @sunbc0120,

We have been unable to reproduce the conda issue using your script.

A combination of conda and pip is known to cause issues for python package interactions. These appear to be dependent on the sequence of installations and the base python environment. Where these occur we strongly recommend creating a fresh python venv, and following the installation instructions (which you have successfully done).

Since we can't find further action to take, we are closing this ticket. Please re-open if you think we can help you further.