Closed sunbc0120 closed 3 years ago
Hi Baichuan,
When using an Ubuntu18 installation as per the instructions you need to do some additional steps to makes things work (since Python 3.7 is not the default Python):
sudo apt install python3.7-dev
sudo apt install python3.7-venv
Once you have activated the environment use:
pip install -U pip
pip install neuron-cc[tensorflow]
pip install torch-neuron
At this point I was able to reproduce your error. It seems that this installation instructions install an incompatible version of numpy. To fix the issue please use the following:
pip install numpy==1.18.5
I'm assuming here that you used the created environment, rather than the preconfigured conda environment in the comments of the tutorial.
Please respond here and let us know if this does not correct your issue . We'll look at the install documentation and wheel requirements to prevent this problem in future.
Hi @mrnikwaws , thanks for your following up.
Yes, I'm using a self-created and managed environment with conda
(due to reasons to use Neuron, TorchServe and another open-source Deep Learning Frameworks and attempt to have their dependencies and version requirements be happy with each other)
I added your suggestion on numpy
and it's making some progress. Now the new error is:
INFO:Neuron:All operators are compiled by neuron-cc (this does not guarantee that neuron-cc will successfully compile)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 176, fused = 176, percent fused = 100.0%
INFO:Neuron:Compiling function _NeuronGraph$556 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config {"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]} --verbose 0'
09/22/2021 03:01:34 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:34 AM ERROR [neuron-cc]: An Internal Compiler Error has occurred
09/22/2021 03:01:34 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Error message: A process in the process pool was terminated abruptly while the future was running or pending.
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Error class: BrokenProcessPool
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Error location: pipeline.compile.0
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Command line: /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Internal details:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 244, in neuroncc.driver.Job.runSingleInputFn
09/22/2021 03:01:34 AM ERROR [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/22/2021 03:01:34 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 286, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:34 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 291, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:34 AM ERROR [neuron-cc]: File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 435, in result
09/22/2021 03:01:34 AM ERROR [neuron-cc]: return self.__get_result()
09/22/2021 03:01:34 AM ERROR [neuron-cc]: File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
09/22/2021 03:01:34 AM ERROR [neuron-cc]: raise self._exception
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Version information:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Neuron Compiler version 1.6.13.0+9f61b2f75
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: HWM version 1.6.0.0-0
09/22/2021 03:01:34 AM ERROR [neuron-cc]: NEFF version Dynamic
09/22/2021 03:01:34 AM ERROR [neuron-cc]: TVM version 1.6.2.0+0
09/22/2021 03:01:34 AM ERROR [neuron-cc]: NumPy version 1.18.5
09/22/2021 03:01:34 AM ERROR [neuron-cc]: MXNet not available
09/22/2021 03:01:34 AM ERROR [neuron-cc]: TF not available
09/22/2021 03:01:34 AM ERROR [neuron-cc]: ONNX not available
09/22/2021 03:01:34 AM ERROR [neuron-cc]:
09/22/2021 03:01:34 AM ERROR [neuron-cc]: Artifacts stored in: /tmp/tmp9vpkz85w
INFO:Neuron:Compile command returned: 1
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$556; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/convert.py", line 345, in op_converter
neuron_function = self.subgraph_compiler(
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/decorators.py", line 195, in trace
raise subprocess.SubprocessError(
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 176, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::_convolution: 53 [supported]
INFO:Neuron: => aten::adaptive_avg_pool2d: 1 [supported]
INFO:Neuron: => aten::add: 16 [supported]
INFO:Neuron: => aten::addmm: 1 [supported]
INFO:Neuron: => aten::batch_norm: 53 [supported]
INFO:Neuron: => aten::flatten: 1 [supported]
INFO:Neuron: => aten::max_pool2d: 1 [supported]
INFO:Neuron: => aten::relu: 49 [supported]
INFO:Neuron: => aten::t: 1 [supported]
Traceback (most recent call last):
File "infer.py", line 14, in <module>
model_neuron = torch.neuron.trace(model, example_inputs=[image])
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/convert.py", line 124, in trace
cu.stats_post_compiler(neuron_graph)
File "/home/ubuntu/miniconda3/lib/python3.8/site-packages/torch_neuron/convert.py", line 456, in stats_post_compiler
raise RuntimeError(
RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!
Manually run the command /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
, results into the following error:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:59 AM ERROR [neuron-cc]: An Internal Compiler Error has occurred
09/22/2021 03:01:59 AM ERROR [neuron-cc]: ***************************************************************
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Error message: A process in the process pool was terminated abruptly while the future was running or pending.
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Error class: BrokenProcessPool
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Error location: pipeline.compile.0
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Command line: /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmp9vpkz85w/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp9vpkz85w/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 0
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Internal details:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 244, in neuroncc.driver.Job.runSingleInputFn
09/22/2021 03:01:59 AM ERROR [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/22/2021 03:01:59 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 286, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:59 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 291, in neuroncc.driver.Job.SingleInputJob.run
09/22/2021 03:01:59 AM ERROR [neuron-cc]: File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 435, in result
09/22/2021 03:01:59 AM ERROR [neuron-cc]: return self.__get_result()
09/22/2021 03:01:59 AM ERROR [neuron-cc]: File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
09/22/2021 03:01:59 AM ERROR [neuron-cc]: raise self._exception
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Version information:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Neuron Compiler version 1.6.13.0+9f61b2f75
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: HWM version 1.6.0.0-0
09/22/2021 03:01:59 AM ERROR [neuron-cc]: NEFF version Dynamic
09/22/2021 03:01:59 AM ERROR [neuron-cc]: TVM version 1.6.2.0+0
09/22/2021 03:01:59 AM ERROR [neuron-cc]: NumPy version 1.18.5
09/22/2021 03:01:59 AM ERROR [neuron-cc]: MXNet not available
09/22/2021 03:01:59 AM ERROR [neuron-cc]: TF not available
09/22/2021 03:01:59 AM ERROR [neuron-cc]: ONNX not available
09/22/2021 03:01:59 AM ERROR [neuron-cc]:
09/22/2021 03:01:59 AM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu/pythonProject/siamese_inf/notebook
Here is my output for my environment:
ubuntu@ip-172-31-34-46:~$ source test_env/bin/activate
(test_env) ubuntu@ip-172-31-34-46:~$ neuron-cc --version
Neuron Compiler version 1.6.13.0+9f61b2f75
HWM version 1.6.0.0-0
NEFF version Dynamic
TVM version 1.6.2.0+0
NumPy version 1.18.5
MXNet not available
TF not available
ONNX not available
(test_env) ubuntu@ip-172-31-34-46:~$ pip list
Package Version
-------------------- ---------------------------
absl-py 0.13.0
astor 0.8.1
attrs 21.2.0
cached-property 1.5.2
cffi 1.14.6
decorator 5.1.0
dmlc-nnvm 1.6.2.0+0
dmlc-topi 1.6.2.0+0
dmlc-tvm 1.6.2.0+0
gast 0.2.2
google-pasta 0.2.0
grpcio 1.40.0
h5py 2.10.0
importlib-metadata 4.8.1
inferentia-hwm 1.6.0.0+0
islpy 2018.2+aws2018.x.853.0.bld0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
networkx 2.4
neuron-cc 1.6.13.0+9f61b2f75
numpy 1.18.5
opt-einsum 3.3.0
Pillow 8.3.2
pip 21.2.4
pkg_resources 0.0.0
protobuf 3.18.0
pycparser 2.20
scipy 1.4.1
setuptools 58.0.4
six 1.16.0
tensorboard 1.15.0
tensorflow 1.15.5
tensorflow-estimator 1.15.1
termcolor 1.1.0
torch 1.8.1
torch-neuron 1.8.1.1.5.21.0
torchvision 0.9.1
typing-extensions 3.10.0.2
Werkzeug 2.0.1
wheel 0.37.0
wrapt 1.12.1
zipp 3.5.0
As you can see I am using the same version of the compiler as you, so I suspect your python environment since I can compile. Can you please share the output of apt list | grep aws-neuron
, conda list
and pip list
? I would like to confirm that your conda environment is healthy. Sometimes version conflicts can occur between conda and pip. If possible as a fallback I recommend creating and testing with a pip virtual environment.
apt list | grep aws-neuron
:
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
aws-neuron-dkms/unknown 2.1.5.0 amd64 [upgradable from: 2.0.450.0]
aws-neuron-k8-plugin/unknown 1.6.22.0 amd64
aws-neuron-k8-scheduler/unknown 1.6.22.0 amd64
aws-neuron-runtime/unknown 1.6.24.0 amd64 [upgradable from: 1.5.0.0]
aws-neuron-runtime-base/unknown 1.6.21.0 amd64 [upgradable from: 1.6.16.0]
aws-neuron-tools/unknown 1.7.25.0 amd64 [upgradable from: 1.6.1.0]
conda list
:
# packages in environment at /home/ubuntu/miniconda3/envs/inf_debug:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
absl-py 0.14.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
attrs 21.2.0 pypi_0 pypi
ca-certificates 2021.5.30 ha878542_0 conda-forge
cached-property 1.5.2 pypi_0 pypi
certifi 2021.5.30 py37h89c1867_0 conda-forge
cffi 1.14.6 pypi_0 pypi
decorator 5.1.0 pypi_0 pypi
dmlc-nnvm 1.6.2.0+0 pypi_0 pypi
dmlc-topi 1.6.2.0+0 pypi_0 pypi
dmlc-tvm 1.6.2.0+0 pypi_0 pypi
gast 0.2.2 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.40.0 pypi_0 pypi
h5py 3.4.0 pypi_0 pypi
importlib-metadata 4.8.1 pypi_0 pypi
inferentia-hwm 1.6.0.0+0 pypi_0 pypi
islpy 2018.2+aws2018.x.853.0.bld0 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 11.2.0 h1d223b6_8 conda-forge
libgomp 11.2.0 h1d223b6_8 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_8 conda-forge
markdown 3.3.4 pypi_0 pypi
ncurses 6.2 h58526e2_4 conda-forge
networkx 2.4 pypi_0 pypi
neuron-cc 1.6.13.0+9f61b2f75 pypi_0 pypi
numpy 1.18.5 pypi_0 pypi
openssl 1.1.1l h7f98852_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
pip 21.2.4 pypi_0 pypi
protobuf 3.18.0 pypi_0 pypi
pycparser 2.20 pypi_0 pypi
python 3.7.11 h12debd9_0
python_abi 3.7 2_cp37m conda-forge
readline 8.1 h46c0cb4_0 conda-forge
scipy 1.4.1 pypi_0 pypi
setuptools 58.0.4 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sqlite 3.36.0 h9cd32fc_1 conda-forge
tensorboard 1.15.0 pypi_0 pypi
tensorflow 1.15.0 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.11 h27826a3_1 conda-forge
torch 1.8.1 pypi_0 pypi
torch-neuron 1.8.1.1.5.21.0 pypi_0 pypi
typing-extensions 3.10.0.2 pypi_0 pypi
werkzeug 2.0.1 pypi_0 pypi
wheel 0.37.0 pypi_0 pypi
wrapt 1.12.1 pypi_0 pypi
xz 5.2.5 h516909a_1 conda-forge
zipp 3.5.0 pypi_0 pypi
zlib 1.2.11 h516909a_1010 conda-forge
pip list
:
Package Version
-------------------- ---------------------------
absl-py 0.14.0
astor 0.8.1
attrs 21.2.0
cached-property 1.5.2
certifi 2021.5.30
cffi 1.14.6
decorator 5.1.0
dmlc-nnvm 1.6.2.0+0
dmlc-topi 1.6.2.0+0
dmlc-tvm 1.6.2.0+0
gast 0.2.2
google-pasta 0.2.0
grpcio 1.40.0
h5py 3.4.0
importlib-metadata 4.8.1
inferentia-hwm 1.6.0.0+0
islpy 2018.2+aws2018.x.853.0.bld0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
networkx 2.4
neuron-cc 1.6.13.0+9f61b2f75
numpy 1.18.5
opt-einsum 3.3.0
pip 21.2.4
protobuf 3.18.0
pycparser 2.20
scipy 1.4.1
setuptools 58.0.4
six 1.16.0
tensorboard 1.15.0
tensorflow 1.15.0
tensorflow-estimator 1.15.1
termcolor 1.1.0
torch 1.8.1
torch-neuron 1.8.1.1.5.21.0
typing-extensions 3.10.0.2
Werkzeug 2.0.1
wheel 0.37.0
wrapt 1.12.1
zipp 3.5.0
I don't see torchvision (where the resnet50 model is pulled from) in you environment - so an unexpected version may be being inherited.
Please try:
pip install torchvision==0.9.1
and see if that resolves your issue
pip list
:
Package Version
-------------------- ---------------------------
absl-py 0.14.0
astor 0.8.1
attrs 21.2.0
cached-property 1.5.2
certifi 2021.5.30
cffi 1.14.6
decorator 5.1.0
dmlc-nnvm 1.6.2.0+0
dmlc-topi 1.6.2.0+0
dmlc-tvm 1.6.2.0+0
gast 0.2.2
google-pasta 0.2.0
grpcio 1.40.0
h5py 3.4.0
importlib-metadata 4.8.1
inferentia-hwm 1.6.0.0+0
islpy 2018.2+aws2018.x.853.0.bld0
Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2
Markdown 3.3.4
networkx 2.4
neuron-cc 1.6.13.0+9f61b2f75
numpy 1.18.5
opt-einsum 3.3.0
Pillow 8.3.2
pip 21.2.4
protobuf 3.18.0
pycparser 2.20
scipy 1.4.1
setuptools 58.0.4
six 1.16.0
tensorboard 1.15.0
tensorflow 1.15.0
tensorflow-estimator 1.15.1
termcolor 1.1.0
torch 1.8.1
torch-neuron 1.8.1.1.5.21.0
torchvision 0.9.1
typing-extensions 3.10.0.2
Werkzeug 2.0.1
wheel 0.37.0
wrapt 1.12.1
zipp 3.5.0
conda list
# packages in environment at /home/ubuntu/miniconda3/envs/inf_debug:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
absl-py 0.14.0 pypi_0 pypi
astor 0.8.1 pypi_0 pypi
attrs 21.2.0 pypi_0 pypi
ca-certificates 2021.5.30 ha878542_0 conda-forge
cached-property 1.5.2 pypi_0 pypi
certifi 2021.5.30 py37h89c1867_0 conda-forge
cffi 1.14.6 pypi_0 pypi
decorator 5.1.0 pypi_0 pypi
dmlc-nnvm 1.6.2.0+0 pypi_0 pypi
dmlc-topi 1.6.2.0+0 pypi_0 pypi
dmlc-tvm 1.6.2.0+0 pypi_0 pypi
gast 0.2.2 pypi_0 pypi
google-pasta 0.2.0 pypi_0 pypi
grpcio 1.40.0 pypi_0 pypi
h5py 3.4.0 pypi_0 pypi
importlib-metadata 4.8.1 pypi_0 pypi
inferentia-hwm 1.6.0.0+0 pypi_0 pypi
islpy 2018.2+aws2018.x.853.0.bld0 pypi_0 pypi
keras-applications 1.0.8 pypi_0 pypi
keras-preprocessing 1.1.2 pypi_0 pypi
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
libffi 3.3 h58526e2_2 conda-forge
libgcc-ng 11.2.0 h1d223b6_8 conda-forge
libgomp 11.2.0 h1d223b6_8 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_8 conda-forge
markdown 3.3.4 pypi_0 pypi
ncurses 6.2 h58526e2_4 conda-forge
networkx 2.4 pypi_0 pypi
neuron-cc 1.6.13.0+9f61b2f75 pypi_0 pypi
numpy 1.18.5 pypi_0 pypi
openssl 1.1.1l h7f98852_0 conda-forge
opt-einsum 3.3.0 pypi_0 pypi
pillow 8.3.2 pypi_0 pypi
pip 21.2.4 pypi_0 pypi
protobuf 3.18.0 pypi_0 pypi
pycparser 2.20 pypi_0 pypi
python 3.7.11 h12debd9_0
python_abi 3.7 2_cp37m conda-forge
readline 8.1 h46c0cb4_0 conda-forge
scipy 1.4.1 pypi_0 pypi
setuptools 58.0.4 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sqlite 3.36.0 h9cd32fc_1 conda-forge
tensorboard 1.15.0 pypi_0 pypi
tensorflow 1.15.0 pypi_0 pypi
tensorflow-estimator 1.15.1 pypi_0 pypi
termcolor 1.1.0 pypi_0 pypi
tk 8.6.11 h27826a3_1 conda-forge
torch 1.8.1 pypi_0 pypi
torch-neuron 1.8.1.1.5.21.0 pypi_0 pypi
torchvision 0.9.1 pypi_0 pypi
typing-extensions 3.10.0.2 pypi_0 pypi
werkzeug 2.0.1 pypi_0 pypi
wheel 0.37.0 pypi_0 pypi
wrapt 1.12.1 pypi_0 pypi
xz 5.2.5 h516909a_1 conda-forge
zipp 3.5.0 pypi_0 pypi
zlib 1.2.11 h516909a_1010 conda-forge
/home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmpwrbt_d7n/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpwrbt_d7n/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35
:
.09/25/2021 12:02:20 AM ERROR [neuron-cc]: ***************************************************************
09/25/2021 12:02:20 AM ERROR [neuron-cc]: An Internal Compiler Error has occurred
09/25/2021 12:02:20 AM ERROR [neuron-cc]: ***************************************************************
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Error message: A process in the process pool was terminated abruptly while the future was running or pending.
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Error class: BrokenProcessPool
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Error location: pipeline.compile.0
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Command line: /home/ubuntu/miniconda3/envs/inf_debug/bin/neuron-cc compile /tmp/tmpwrbt_d7n/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpwrbt_d7n/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Internal details:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 244, in neuroncc.driver.Job.runSingleInputFn
09/25/2021 12:02:20 AM ERROR [neuron-cc]: File "neuroncc/driver/Pipeline.py", line 30, in neuroncc.driver.Pipeline.Pipeline.runSingleInput
09/25/2021 12:02:20 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 286, in neuroncc.driver.Job.SingleInputJob.run
09/25/2021 12:02:20 AM ERROR [neuron-cc]: File "neuroncc/driver/Job.py", line 291, in neuroncc.driver.Job.SingleInputJob.run
09/25/2021 12:02:20 AM ERROR [neuron-cc]: File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 435, in result
09/25/2021 12:02:20 AM ERROR [neuron-cc]: return self.__get_result()
09/25/2021 12:02:20 AM ERROR [neuron-cc]: File "/home/ubuntu/miniconda3/envs/inf_debug/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
09/25/2021 12:02:20 AM ERROR [neuron-cc]: raise self._exception
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Version information:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Neuron Compiler version 1.6.13.0+9f61b2f75
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: HWM version 1.6.0.0-0
09/25/2021 12:02:20 AM ERROR [neuron-cc]: NEFF version Dynamic
09/25/2021 12:02:20 AM ERROR [neuron-cc]: TVM version 1.6.2.0+0
09/25/2021 12:02:20 AM ERROR [neuron-cc]: NumPy version 1.18.5
09/25/2021 12:02:20 AM ERROR [neuron-cc]: MXNet not available
09/25/2021 12:02:20 AM ERROR [neuron-cc]: TF not available
09/25/2021 12:02:20 AM ERROR [neuron-cc]: ONNX not available
09/25/2021 12:02:20 AM ERROR [neuron-cc]:
09/25/2021 12:02:20 AM ERROR [neuron-cc]: Artifacts stored in: /home/ubuntu/pythonProject/siamese_inf/notebook
Compiler status ERROR
FYI, I downgraded:
torch-neuron to 1.7.1
pytorch to 1.7.1
torchvision to 0.8.2
Now it works.
Hello @sunbc0120,
It looks like the immediate problem has been resolved. If you are able to share your model then I'd like to investigate and improve our error message for this situation.
Regards, Taylor
+1 if you can share the model that would be helpful. Strangely this compiled for me with the same configuration (though not using a conda environment) using torch==1.8.1 and torchvision==0.9.1. If we can discover the discrepancy that may help other torch-neuron users
Thanks very much,
from torchvision import models
## Load a pretrained ResNet50 model
model = models.resnet50(pretrained=True)
## Tell the model we are using it for evaluation (not training)
model.eval()
(Tried to attached the model here but GitHub is having issue with *.zip: https://github.com/github/hub/issues/1479)
Followed the suggestion by @mrnikwaws and tested pip virtual environment, it works without any error!
Therefore I guess the issue is around conda
when upgrading torch-neuron
from 1.7.1
to 1.8.1
. Totally understand neuron
won't support conda
anymore but the reality is some other packages are better managed in conda
channels and some existing user-cases are already locked in to conda
. Maybe one solution is to decouple the development (use e.g. conda
) and neuron deployment (pure pip
) environment?
https://aws.amazon.com/blogs/developer/neuron-conda-packages-eol/
Here is a sample script if you'd like to reproduce the error:
# new python environment
conda update --force conda
conda create -n debug python=3.7 -y
conda activate debug
conda install -c conda-forge gh -y
# fix: downgrade pytorch
# conda install pytorch==1.7.1 torchvision==0.8.2 -c pytorch
# pytorch neuron sdk
# fix:
# pip install "torch-neuron==1.7.*"
pip install torch-neuron
pip install neuron-cc[tensorflow]
pip install torchvision==0.9.1
# torchserve
# pip install torchserve==0.3.0 torch-model-archiver==0.3.0
# verify
which python
python -c "import torch.neuron"
cat << EOF > test.py && python test.py
import torch
import numpy as np
import os
import torch_neuron
from torchvision import models
import logging
## Enable logging so we can see any important warnings
logger = logging.getLogger('Neuron')
logger.setLevel(logging.INFO)
image = torch.zeros([1, 3, 224, 224], dtype=torch.float32)
## Load a pretrained ResNet50 model
model = models.resnet50(pretrained=True)
## Tell the model we are using it for evaluation (not training)
model.eval()
## Analyze the model - this will show operator support and operator count
torch.neuron.analyze_model( model, example_inputs=[image] )
## Now compile the model - with logging set to "info" we will see
## what compiles for Neuron, and if there are any fallbacks
## Note: The "-O2" setting is default in recent releases, but may be needed for DLAMI
## and older installed environments- model_neuron = torch.neuron.trace(model, example_inputs=[image], compiler_args="-O2")
model_neuron = torch.neuron.trace(model, example_inputs=[image])
# The output of this step will have the percentage of operations compiled, example:
#
# INFO:Neuron:The neuron partitioner created 1 sub-graphs
# INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
## Export to saved model
model_neuron.save("resnet50_neuron.pt")
print("Compile Args, input tensor: {}, data type:'fp32', 'core': 1 ")
print("Compile success")
EOF
Thanks @sunbc0120 for sharing the script. We're working on re-creating the issue, and will update soon.
Hello @sunbc0120,
We have been unable to reproduce the conda issue using your script.
A combination of conda and pip is known to cause issues for python package interactions. These appear to be dependent on the sequence of installations and the base python environment. Where these occur we strongly recommend creating a fresh python venv, and following the installation instructions (which you have successfully done).
Since we can't find further action to take, we are closing this ticket. Please re-open if you think we can help you further.
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/install-pytorch.html
Environment
Python 3.7.10
:https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-pytorch-neuron.html
Got error:
https://github.com/tensorflow/tensorflow/issues/48797#issuecomment-892706478
Got the following error:
/home/ubuntu/miniconda3/envs/inf/bin/neuron-cc compile /tmp/tmp0ypcs0gm/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp0ypcs0gm/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 3, 224, 224], "float32"]}, "outputs": ["Add_122:0"]}' --verbose 35
, results into the following error: