aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
440 stars 145 forks source link

Cannot convert BERT with Neuron SDK following Hugging Face tutorial #462

Closed sidyakinian closed 2 years ago

sidyakinian commented 2 years ago

I'm following exactly this Hugging Face tutorial (also posted here), and running into runtime error: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!.

Notebook code

[1]:  !pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

out:  Writing to /home/ec2-user/.config/pip/pip.conf

[2]:  !pip install torch-neuron==1.9.1.* neuron-cc[tensorflow] sagemaker>=2.79.0 transformers==4.12.3 –upgrade

out:  WARNING: neuron-cc 1.0 does not provide the extra 'tensorflow'
      WARNING: You are using pip version 22.0.4; however, version 22.2 is available.
      You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p38/bin/python -m pip install --upgrade pip' command.

[3]:  # Install tensorflow because it wasn't installed with neuron-cc
      !pip install tensorflow –upgrade

out:  Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
      Requirement already satisfied: tensorflow in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (2.9.1)
      Requirement already satisfied: termcolor>=1.1.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.1.0)
      Requirement already satisfied: google-pasta>=0.1.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (0.2.0)
      Requirement already satisfied: tensorflow-io-gcs-filesystem>=0.23.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (0.26.0)
      Requirement already satisfied: libclang>=13.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (14.0.6)
      Requirement already satisfied: setuptools in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (59.2.0)
      Requirement already satisfied: six>=1.12.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.16.0)
      Requirement already satisfied: tensorboard<2.10,>=2.9 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (2.9.1)
      Requirement already satisfied: protobuf<3.20,>=3.9.2 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (3.19.4)
      Requirement already satisfied: grpcio<2.0,>=1.24.3 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.47.0)
      Requirement already satisfied: opt-einsum>=2.3.2 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (3.3.0)
      Requirement already satisfied: wrapt>=1.11.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.13.3)
      Requirement already satisfied: h5py>=2.9.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (3.4.0)
      Requirement already satisfied: absl-py>=1.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.2.0)
      Requirement already satisfied: numpy>=1.20 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.20.0)
      Requirement already satisfied: typing-extensions>=3.6.6 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (4.0.0)
      Requirement already satisfied: keras-preprocessing>=1.1.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.1.2)
      Requirement already satisfied: packaging in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (21.3)
      Requirement already satisfied: keras<2.10.0,>=2.9.0rc0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (2.9.0)
      Requirement already satisfied: gast<=0.4.0,>=0.2.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (0.4.0)
      Requirement already satisfied: tensorflow-estimator<2.10.0,>=2.9.0rc0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (2.9.0)
      Requirement already satisfied: flatbuffers<2,>=1.12 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.12)
      Requirement already satisfied: astunparse>=1.6.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorflow) (1.6.3)
      Requirement already satisfied: wheel<1.0,>=0.23.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from astunparse>=1.6.0->tensorflow) (0.37.0)
      Requirement already satisfied: requests<3,>=2.21.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (2.26.0)
      Requirement already satisfied: google-auth<3,>=1.6.3 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (2.9.1)
      Requirement already satisfied: google-auth-oauthlib<0.5,>=0.4.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (0.4.6)
      Requirement already satisfied: markdown>=2.6.8 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (3.4.1)
      Requirement already satisfied: tensorboard-data-server<0.7.0,>=0.6.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (0.6.1)
      Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (1.8.1)
      Requirement already satisfied: werkzeug>=1.0.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from tensorboard<2.10,>=2.9->tensorflow) (2.0.3)
      Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from packaging->tensorflow) (3.0.6)
      Requirement already satisfied: cachetools<6.0,>=2.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (5.2.0)
      Requirement already satisfied: rsa<5,>=3.1.4 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (4.7.2)
      Requirement already satisfied: pyasn1-modules>=0.2.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (0.2.8)
      Requirement already satisfied: requests-oauthlib>=0.7.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow) (1.3.1)
      Requirement already satisfied: importlib-metadata>=4.4 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from markdown>=2.6.8->tensorboard<2.10,>=2.9->tensorflow) (4.8.2)
      Requirement already satisfied: charset-normalizer~=2.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (2.0.7)
      Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (1.26.8)
      Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (2021.10.8)
      Requirement already satisfied: idna<4,>=2.5 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from requests<3,>=2.21.0->tensorboard<2.10,>=2.9->tensorflow) (3.1)
      Requirement already satisfied: zipp>=0.5 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from importlib-metadata>=4.4->markdown>=2.6.8->tensorboard<2.10,>=2.9->tensorflow) (3.6.0)
      Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard<2.10,>=2.9->tensorflow) (0.4.8)
      Requirement already satisfied: oauthlib>=3.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<0.5,>=0.4.1->tensorboard<2.10,>=2.9->tensorflow) (3.2.0)
      WARNING: You are using pip version 22.0.4; however, version 22.2 is available.
      You should consider upgrading via the '/home/ec2-user/anaconda3/envs/pytorch_p38/bin/python -m pip install --upgrade pip' command.

[4]:  model_id = 'distilbert-base-uncased-finetuned-sst-2-english'

[5]:  import os
      import tensorflow  # to workaround a protobuf version conflict issue
      import torch
      import torch.neuron
      from transformers import AutoTokenizer, AutoModelForSequenceClassification

      # load tokenizer and model
      tokenizer = AutoTokenizer.from_pretrained(model_id)
      model = AutoModelForSequenceClassification.from_pretrained(model_id, torchscript=True)

      # create dummy input for max length 128
      dummy_input = "dummy input which will be padded later"
      max_length = 128
      embeddings = tokenizer(dummy_input, max_length=max_length, padding="max_length",return_tensors="pt")
      neuron_inputs = tuple(embeddings.values())

      # compile model with torch.neuron.trace and update config
      model_neuron = torch.neuron.trace(model, neuron_inputs)
      model.config.update({"traced_sequence_length": max_length})

      # save tokenizer, neuron model and config for later use
      save_dir="tmp"
      os.makedirs("tmp",exist_ok=True)
      model_neuron.save(os.path.join(save_dir,"neuron_model.pt"))
      tokenizer.save_pretrained(save_dir)
      model.config.save_pretrained(save_dir)

out:  INFO:Neuron:There are 2 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
      INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 249, fused = 242, percent fused = 97.19%
      INFO:Neuron:Number of neuron graph operations 671 did not match traced graph 645 - using heuristic matching of hierarchical information
      WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$274; falling back to native python function call
      ERROR:Neuron:Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com'
      Traceback (most recent call last):
        File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch_neuron/convert.py", line 381, in op_converter
          neuron_function = self.subgraph_compiler(
        File "/home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch_neuron/decorators.py", line 134, in trace
          raise RuntimeError(
      RuntimeError: Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com'
      INFO:Neuron:Number of arithmetic operators (post-compilation) before = 249, compiled = 0, percent compiled = 0.0%
      INFO:Neuron:The neuron partitioner created 1 sub-graphs
      INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
      INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
      INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
      INFO:Neuron: => aten::Int: 37 [supported]
      INFO:Neuron: => aten::add: 13 [supported]
      INFO:Neuron: => aten::contiguous: 6 [supported]
      INFO:Neuron: => aten::div: 6 [supported]
      INFO:Neuron: => aten::dropout: 14 [supported]
      INFO:Neuron: => aten::embedding: 2 [not supported]
      INFO:Neuron: => aten::eq: 6 [supported]
      INFO:Neuron: => aten::expand_as: 6 [supported]
      INFO:Neuron: => aten::gelu: 6 [supported]
      INFO:Neuron: => aten::layer_norm: 13 [supported]
      INFO:Neuron: => aten::linear: 38 [supported]
      INFO:Neuron: => aten::masked_fill: 6 [supported]
      INFO:Neuron: => aten::matmul: 12 [supported]
      INFO:Neuron: => aten::relu: 1 [supported]
      INFO:Neuron: => aten::select: 1 [supported]
      INFO:Neuron: => aten::size: 13 [supported]
      INFO:Neuron: => aten::slice: 3 [supported]
      INFO:Neuron: => aten::softmax: 6 [supported]
      INFO:Neuron: => aten::transpose: 30 [supported]
      INFO:Neuron: => aten::view: 30 [supported]
      --------------------------------------------------------------------–––-`

      RuntimeError                              Traceback (most recent call last)
      /tmp/ipykernel_22480/34065999.py in <cell line: 18>()
           16 
           17 # compile model with torch.neuron.trace and update config
      ---> 18 model_neuron = torch.neuron.trace(model, neuron_inputs)
           19 model.config.update({"traced_sequence_length": max_length})
           20 

      ~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs)
          182         logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
          183         neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
      --> 184     cu.stats_post_compiler(neuron_graph)
          185 
          186     # Wrap the compiled version of the model in a script module. Note that this is

      ~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph)
          490 
          491         if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
      --> 492             raise RuntimeError(
          493                 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
          494 

      RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!``

Packages

I'm using Amazon Sagemaker ml.t3.medium notebook instance.

Output of `pip list` ``` Package Version ---------------------------------- --------------------–––`` absl-py 1.2.0 aiobotocore 2.0.1 aiohttp 3.8.1 aioitertools 0.8.0 aiosignal 1.2.0 alabaster 0.7.12 anaconda-client 1.8.0 anaconda-project 0.10.2 anyio 3.4.0 appdirs 1.4.4 argh 0.26.2 argon2-cffi 21.1.0 arrow 1.2.1 asn1crypto 1.4.0 astroid 2.8.6 astropy 5.0 astunparse 1.6.3 async-generator 1.10 async-timeout 4.0.1 atomicwrites 1.4.0 attrs 21.2.0 autopep8 1.5.6 autovizwidget 0.19.1 awscli 1.25.38 Babel 2.9.1 backcall 0.2.0 backports.functools-lru-cache 1.6.4 backports.shutil-get-terminal-size 1.0.0 bcrypt 3.2.2 beautifulsoup4 4.10.0 binaryornot 0.4.4 bitarray 2.3.4 bkcharts 0.2 black 21.11b0 bleach 4.1.0 blis 0.7.6 bokeh 2.4.2 boto 2.49.0 boto3 1.24.38 botocore 1.27.44 Bottleneck 1.3.2 brotlipy 0.7.0 cached-property 1.5.2 cachetools 5.2.0 captum 0.4.1 catalogue 2.0.6 certifi 2021.10.8 cffi 1.15.0 chardet 4.0.0 charset-normalizer 2.0.7 click 8.0.3 cloudpickle 2.0.0 clyent 1.2.2 colorama 0.4.3 conda-pack 0.6.0 contextlib2 21.6.0 cookiecutter 1.7.0 coverage 6.3.2 cryptography 36.0.0 cycler 0.11.0 cymem 2.0.6 Cython 0.29.24 cytoolz 0.11.2 dask 2021.11.2 debugpy 1.5.1 decorator 5.1.0 defusedxml 0.7.1 diff-match-patch 20200713 dill 0.3.4 distributed 2021.11.2 distro 1.7.0 dmlc-nnvm 1.11.0.0+0 dmlc-topi 1.11.0.0+0 dmlc-tvm 1.11.0.0+0 docker 5.0.3 docker-compose 1.29.2 dockerpty 0.4.1 docopt 0.6.2 docutils 0.15.2 dparse 0.5.1 entrypoints 0.3 environment-kernels 1.1.1 et-xmlfile 1.0.1 fastai 2.1.10 fastcache 1.1.0 fastcore 1.3.29 fastprogress 1.0.2 filelock 3.4.0 flake8 3.8.4 Flask 2.0.2 Flask-Cors 3.0.10 flatbuffers 1.12 fonttools 4.28.2 frozenlist 1.2.0 fsspec 2021.11.1 future 0.18.2 gast 0.4.0 gevent 21.8.0 glob2 0.7 gmpy2 2.1.0rc1 google-auth 2.9.1 google-auth-oauthlib 0.4.6 google-pasta 0.2.0 greenlet 1.1.2 grpcio 1.47.0 gssapi 1.7.3 h5py 3.4.0 hdijupyterutils 0.19.1 HeapDict 1.0.1 horovod 0.23.0 html5lib 1.1 huggingface-hub 0.8.1 idna 3.1 imagecodecs 2021.11.20 imageio 2.9.0 imagesize 1.3.0 importlib-metadata 4.8.2 importlib-resources 5.4.0 inferentia-hwm 1.11.0.0+0 inflection 0.5.1 iniconfig 1.1.1 intervaltree 3.0.2 ipykernel 6.5.0 ipyparallel 8.0.0 ipython 7.32.0 ipython-genutils 0.2.0 ipywidgets 7.6.5 islpy 2021.1+aws2021.x.16.0.bld0 isort 5.10.1 itsdangerous 2.0.1 jdcal 1.4.1 jedi 0.17.2 jeepney 0.7.1 Jinja2 3.0.3 jinja2-time 0.2.0 jmespath 0.10.0 joblib 1.1.0 json5 0.9.5 jsonschema 3.2.0 jupyter 1.0.0 jupyter-client 7.1.0 jupyter-console 6.4.0 jupyter-core 4.9.1 jupyter-server 1.12.0 jupyterlab 3.2.4 jupyterlab-pygments 0.1.2 jupyterlab-server 2.8.2 jupyterlab-widgets 1.0.2 keras 2.9.0 Keras-Preprocessing 1.1.2 keyring 23.2.1 kiwisolver 1.3.2 krb5 0.3.0 langcodes 3.3.0 lazy-object-proxy 1.6.0 libarchive-c 3.1 libclang 14.0.6 llvmlite 0.36.0 locket 0.2.0 lxml 4.8.0 Markdown 3.4.1 MarkupSafe 2.0.1 matplotlib 3.5.0 matplotlib-inline 0.1.3 mccabe 0.6.1 mistune 0.8.4 mkl-fft 1.3.1 mkl-random 1.2.2 mkl-service 2.4.0 mock 4.0.3 more-itertools 8.12.0 mpi4py 3.0.3 mpmath 1.2.1 msgpack 1.0.3 multidict 5.2.0 multipledispatch 0.6.0 multiprocess 0.70.12.2 munkres 1.1.4 murmurhash 1.0.6 mypy-extensions 0.4.3 nb-conda 2.2.1 nb-conda-kernels 2.3.1 nbclassic 0.3.4 nbclient 0.5.9 nbconvert 6.3.0 nbformat 5.1.3 nest-asyncio 1.5.1 networkx 2.4 neuron-cc 1.0 nltk 3.6.5 nose 1.3.7 notebook 6.4.6 numba 0.53.1 numexpr 2.7.3 numpy 1.20.0 numpydoc 1.1.0 oauthlib 3.2.0 olefile 0.46 onnx 1.10.2 opencv-python 4.5.1.48 openpyxl 3.0.9 opt-einsum 3.3.0 packaging 21.3 pandas 1.3.4 pandocfilters 1.5.0 paramiko 2.11.0 parso 0.7.0 partd 1.2.0 path 16.2.0 pathlib2 2.3.6 pathos 0.2.8 pathspec 0.9.0 pathtools 0.1.2 pathy 0.6.1 patsy 0.5.2 pep8 1.7.1 pexpect 4.8.0 pickleshare 0.7.5 Pillow 9.0.1 pip 22.0.4 pkginfo 1.8.1 platformdirs 2.3.0 plotly 5.6.0 pluggy 1.0.0 ply 3.11 pooch 1.5.2 pox 0.3.0 poyo 0.5.0 ppft 1.6.6.4 preshed 3.0.6 prometheus-client 0.12.0 prompt-toolkit 3.0.22 protobuf 3.19.4 protobuf3-to-dict 0.1.5 psutil 5.8.0 psycopg2 2.9.2 ptyprocess 0.7.0 py 1.11.0 py4j 0.10.9 pyarrow 7.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycodestyle 2.6.0 pycosat 0.6.3 pycparser 2.21 pycurl 7.44.1 pydantic 1.8.2 pydocstyle 6.1.1 pyerfa 2.0.0.1 pyflakes 2.2.0 pyfunctional 1.4.3 pygal 2.4.0 Pygments 2.10.0 pyinstrument 3.4.2 pyinstrument-cext 0.2.4 pykerberos 1.2.1 pylint 2.11.1 pyls-black 0.4.6 pyls-spyder 0.3.2 PyNaCl 1.5.0 pynvml 8.0.4 pyodbc 4.0.32 pyOpenSSL 21.0.0 pyparsing 3.0.6 PyQt5 5.12.3 PyQt5_sip 4.19.18 PyQtChart 5.12 PyQtWebEngine 5.12.1 pyrsistent 0.18.0 PySocks 1.7.1 pyspark 3.0.0 pyspnego 0.5.0 pytest 6.2.5 python-dateutil 2.8.2 python-dotenv 0.20.0 python-jsonrpc-server 0.4.0 python-language-server 0.36.2 pytz 2021.3 PyWavelets 1.2.0 pyxdg 0.27 PyYAML 5.4.1 pyzmq 22.3.0 QDarkStyle 3.0.2 qstylizer 0.2.1 QtAwesome 1.1.0 qtconsole 5.2.1 QtPy 1.11.2 regex 2021.11.10 requests 2.26.0 requests-kerberos 0.14.0 requests-oauthlib 1.3.1 rope 0.22.0 rsa 4.7.2 Rtree 0.9.7 ruamel-yaml-conda 0.15.80 s3fs 0.4.0 s3transfer 0.6.0 sacremoses 0.0.53 safety 1.10.3 sagemaker 2.101.1 sagemaker-pyspark 1.4.2 scikit-image 0.18.3 scikit-learn 1.0.1 scipy 1.4.1 seaborn 0.11.2 SecretStorage 3.3.1 Send2Trash 1.8.0 setuptools 59.2.0 shap 0.40.0 simplegeneric 0.8.1 singledispatch 0.0.0 sip 4.19.25 six 1.16.0 sklearn 0.0 slicer 0.0.7 smart-open 5.2.1 smclarify 0.2 smdebug 1.0.12 smdebug-rulesconfig 1.0.1 sniffio 1.2.0 snowballstemmer 2.2.0 sortedcollections 2.1.0 sortedcontainers 2.4.0 soupsieve 2.3 spacy 3.2.3 spacy-legacy 3.0.9 spacy-loggers 1.0.1 sparkmagic 0.15.0 Sphinx 4.3.0 sphinxcontrib-applehelp 1.0.2 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.0 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 sphinxcontrib-websupport 1.2.4 spyder 5.0.5 spyder-kernels 2.0.5 SQLAlchemy 1.4.27 srsly 2.4.2 statsmodels 0.13.1 sympy 1.9 tables 3.6.1 tabulate 0.8.9 tblib 1.7.0 tenacity 8.0.1 tensorboard 2.9.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.9.1 tensorflow-estimator 2.9.0 tensorflow-io-gcs-filesystem 0.26.0 termcolor 1.1.0 terminado 0.12.1 testpath 0.5.0 textdistance 4.2.2 texttable 1.6.4 thinc 8.0.15 threadpoolctl 3.0.0 three-merge 0.1.1 tifffile 2021.11.2 tinycss2 1.1.1 tokenizers 0.10.3 toml 0.10.2 tomli 1.2.2 toolz 0.11.2 torch 1.9.1 torch-model-archiver 0.5.0b20211117 torch-neuron 1.9.1.2.3.0.0 torch-workflow-archiver 0.2.0b20211118 torchaudio 0.10.0 torchserve 0.5.0b20211117 torchtext 0.11.0 torchvision 0.11.1 tornado 6.1 tqdm 4.62.3 traitlets 5.1.1 transformers 4.12.3 typed-ast 1.5.0 typer 0.4.0 typing_extensions 4.0.0 ujson 4.2.0 unicodecsv 0.14.1 unicodedata2 13.0.0.post2 urllib3 1.26.8 wasabi 0.9.0 watchdog 2.1.6 wcwidth 0.2.5 webencodings 0.5.1 websocket-client 0.59.0 Werkzeug 2.0.3 wheel 0.37.0 whichcraft 0.6.1 widgetsnbextension 3.5.2 wrapt 1.13.3 wurlitzer 3.0.2 xlrd 2.0.1 XlsxWriter 3.0.2 xlwt 1.3.0 yapf 0.31.0 yarl 1.7.2 zict 2.0.0 zipp 3.6.0 zope.event 4.5.0 zope.interface 5.4.0 ```
aj2622 commented 2 years ago

your env is not set up proper. follow the instructions here https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-quickstart.html#pytorch-quickstart

aws-stdun commented 2 years ago

Hi sidyakinian,

I was able to allocate a ml.t3.medium and open JupyterLab 1. I then used JupyterLab to open a terminal and ran:

source activate python3
pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com/
pip install torch-neuron==1.9.1.* neuron-cc[tensorflow]
pip install sagemaker>=2.79.0 transformers==4.12.3 --upgrade

This worked without the issue you encountered. When trying to run the commands inside the notebook directly as in the tutorial, the process was Killed , possibly b/c the instance size is too small. I was not able to use the pytorch_p38 environment, which I believe you may have activated by accident based on this line in your post:

 ~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/torch_neuron/convert.py

Please note that this environment is not the one mentioned in the tutorial you linked. The tutorial uses the python3 conda env. When trying to run the commands using the pytorch_p38 env, the process is Killed for me either inside a notebook or a terminal.

My suggestion is to retry using the steps I mentioned above, and if you encounter memory issues please try a larger instance size.

sidyakinian commented 2 years ago

@aws-stdun Thank you, your suggestion works! Seems like a different conda env was the problem. Getting a TF import error now, but that's a different issue.

sidyakinian commented 2 years ago

@aws-stdun Sorry to say the issue persists. I've done the following:

After all this, the same error appears.

Stack trace (almost identical to before) ```python INFO:Neuron:There are 2 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 249, fused = 242, percent fused = 97.19% INFO:Neuron:Number of neuron graph operations 671 did not match traced graph 645 - using heuristic matching of hierarchical information WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$551; falling back to native python function call ERROR:Neuron:Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com' Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py", line 381, in op_converter neuron_function = self.subgraph_compiler( File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/decorators.py", line 134, in trace raise RuntimeError( RuntimeError: Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com' INFO:Neuron:Number of arithmetic operators (post-compilation) before = 249, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 37 [supported] INFO:Neuron: => aten::add: 13 [supported] INFO:Neuron: => aten::contiguous: 6 [supported] INFO:Neuron: => aten::div: 6 [supported] INFO:Neuron: => aten::dropout: 14 [supported] INFO:Neuron: => aten::embedding: 2 [not supported] INFO:Neuron: => aten::eq: 6 [supported] INFO:Neuron: => aten::expand_as: 6 [supported] INFO:Neuron: => aten::gelu: 6 [supported] INFO:Neuron: => aten::layer_norm: 13 [supported] INFO:Neuron: => aten::linear: 38 [supported] INFO:Neuron: => aten::masked_fill: 6 [supported] INFO:Neuron: => aten::matmul: 12 [supported] INFO:Neuron: => aten::relu: 1 [supported] INFO:Neuron: => aten::select: 1 [supported] INFO:Neuron: => aten::size: 13 [supported] INFO:Neuron: => aten::slice: 3 [supported] INFO:Neuron: => aten::softmax: 6 [supported] INFO:Neuron: => aten::transpose: 30 [supported] INFO:Neuron: => aten::view: 30 [supported] --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /tmp/ipykernel_19353/2613137105.py in () 10 11 # compile model with torch.neuron.trace and update config ---> 12 model_neuron = torch.neuron.trace(model, neuron_inputs) 13 model.config.update({"traced_sequence_length": max_length}) 14 ~/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs) 182 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line())) 183 neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs) --> 184 cu.stats_post_compiler(neuron_graph) 185 186 # Wrap the compiled version of the model in a script module. Note that this is ~/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph) 490 491 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron: --> 492 raise RuntimeError( 493 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") 494 RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```

It looks like aten::embedding operation isn't supported. import torch.neuron; print(*torch.neuron.get_supported_operations(), sep='\n') indeed doesn't list aten::embedding, though it lists aten::embedding_renorm_. I've checked release notes for torch.neuron supported operations, seems like aten::embedding was added in v1.0.763.0 but then promptly removed in v1.0.1001.0 because it didn't meet performance criteria.

Also, could this be related to #410?

sidyakinian commented 2 years ago

I'm getting the same issue with the Neuron SDK PyTorch tutorial on conda_python3 kernel and ml.c5.4xlarge Sagemaker notebook instance.

Stack trace ```python /home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/transformers/modeling_utils.py:1967: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert all( INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md) INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99% INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$662; falling back to native python function call ERROR:Neuron:Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com' Traceback (most recent call last): File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py", line 381, in op_converter neuron_function = self.subgraph_compiler( File "/home/ec2-user/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/decorators.py", line 134, in trace raise RuntimeError( RuntimeError: Please check that neuron-cc is installed and working properly. You can install neuron-cc using 'python3 -m pip install neuron-cc[tensorflow] -U --extra-index-url=https://pip.repos.neuron.amazonaws.com' INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0% INFO:Neuron:The neuron partitioner created 1 sub-graphs INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0% INFO:Neuron:Compiled these operators (and operator counts) to Neuron: INFO:Neuron:Not compiled operators (and operator counts) to Neuron: INFO:Neuron: => aten::Int: 97 [supported] INFO:Neuron: => aten::add: 39 [supported] INFO:Neuron: => aten::contiguous: 12 [supported] INFO:Neuron: => aten::div: 12 [supported] INFO:Neuron: => aten::dropout: 38 [supported] INFO:Neuron: => aten::embedding: 3 [not supported] INFO:Neuron: => aten::gelu: 12 [supported] INFO:Neuron: => aten::layer_norm: 25 [supported] INFO:Neuron: => aten::linear: 74 [supported] INFO:Neuron: => aten::matmul: 24 [supported] INFO:Neuron: => aten::mul: 1 [supported] INFO:Neuron: => aten::permute: 48 [supported] INFO:Neuron: => aten::rsub: 1 [supported] INFO:Neuron: => aten::select: 1 [supported] INFO:Neuron: => aten::size: 97 [supported] INFO:Neuron: => aten::slice: 5 [supported] INFO:Neuron: => aten::softmax: 12 [supported] INFO:Neuron: => aten::tanh: 1 [supported] INFO:Neuron: => aten::to: 1 [supported] INFO:Neuron: => aten::transpose: 12 [supported] INFO:Neuron: => aten::unsqueeze: 2 [supported] INFO:Neuron: => aten::view: 48 [supported] ----------------------------------------------------------------------–--- RuntimeError Traceback (most recent call last) in ~/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs) 182 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line())) 183 neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs) --> 184 cu.stats_post_compiler(neuron_graph) 185 186 # Wrap the compiled version of the model in a script module. Note that this is ~/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph) 490 491 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron: --> 492 raise RuntimeError( 493 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") 494 RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```

I've tried reinstalling neuron-cc as advised by Please check that neuron-cc is installed and working properly.

sidyakinian commented 2 years ago

Downgrading to Python 3.7.13 and compiling on Google Colab doesn't work either.

aws-stdun commented 2 years ago

Hi @sidyakinian,

Based on your error messages above, it doesn't look like you've installed the compiler (neuron-cc).

The environment setup instructions from the tutorial you shared have this line, which is required: !pip install torch-neuron==1.9.1.* neuron-cc[tensorflow] sagemaker>=2.79.0 transformers==4.12.3 --upgrade

If you believe your env is setup correctly and you are still receiving this error, can you share the output of: !pip list | grep neuron

With respect to your comment prior to that about aten:embedding not being supported: Unsupported operators will be placed on CPU. In this model, torch-neuron will partition the embedding onto CPU, which is expected here and will not sig. impact the performance of Inferentia on this model.

sidyakinian commented 2 years ago

@aws-stdun Output of !pip list | grep neuron:

neuron-cc                          1.0
torch-neuron                       1.9.1.2.3.0.0
aws-taylor commented 2 years ago

Hello @sidyakinian ,

It appears that you may have accidentally downloaded neuron-cc from Pypi (https://pypi.org/project/neuron-cc/) and not our repository (https://pip.repos.neuron.amazonaws.com). Can you try re-installing with python -m pip install --extra-index-url https://pip.repos.neuron.amazonaws.com --force-reinstall neuron-cc and try again?

sidyakinian commented 2 years ago

@aws-taylor Thank you, I do seem to have had the wrong version! New versions of neuron-cc and torch-neuron:

neuron-cc                          1.11.7.0+aec18907e
torch-neuron                       1.9.1.2.3.0.0

Unfortunately, still doesn't solve the issue.

Stack trace ``` Downloading: 0%| | 0.00/48.0 [00:00 aten::Int: 37 [supported] INFO:Neuron: => aten::add: 13 [supported] INFO:Neuron: => aten::contiguous: 6 [supported] INFO:Neuron: => aten::div: 6 [supported] INFO:Neuron: => aten::dropout: 14 [supported] INFO:Neuron: => aten::embedding: 2 [not supported] INFO:Neuron: => aten::eq: 6 [supported] INFO:Neuron: => aten::expand_as: 6 [supported] INFO:Neuron: => aten::gelu: 6 [supported] INFO:Neuron: => aten::layer_norm: 13 [supported] INFO:Neuron: => aten::linear: 38 [supported] INFO:Neuron: => aten::masked_fill: 6 [supported] INFO:Neuron: => aten::matmul: 12 [supported] INFO:Neuron: => aten::relu: 1 [supported] INFO:Neuron: => aten::select: 1 [supported] INFO:Neuron: => aten::size: 13 [supported] INFO:Neuron: => aten::slice: 3 [supported] INFO:Neuron: => aten::softmax: 6 [supported] INFO:Neuron: => aten::transpose: 30 [supported] INFO:Neuron: => aten::view: 30 [supported] --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) /tmp/ipykernel_8009/2613137105.py in () 10 11 # compile model with torch.neuron.trace and update config ---> 12 model_neuron = torch.neuron.trace(model, neuron_inputs) 13 model.config.update({"traced_sequence_length": max_length}) 14 ~/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs) 182 logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line())) 183 neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs) --> 184 cu.stats_post_compiler(neuron_graph) 185 186 # Wrap the compiled version of the model in a script module. Note that this is ~/anaconda3/envs/python3/lib/python3.8/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph) 490 491 if succesful_compilations == 0 and not self.allow_no_ops_on_neuron: --> 492 raise RuntimeError( 493 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!") 494 RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace! ```

I've also tried updating to torch-neuron 1.11.0.2.3.0.0 but still get the same error and stack trace.

aws-stdun commented 2 years ago

Hi @sidyakinian,

It looks like you may have installed Tensorflow 2 into your environment, which is not compatible with torch-neuron.

Can you try pip install tensorflow==1.15 and rerun compilation?

jeffhataws commented 2 years ago

Hi @sidyakinian, please reopen this ticket if you will need further assistance. Thanks!