aws-neuron / aws-neuron-sdk

Powering AWS purpose-built machine learning chips. Blazing fast and cost effective, natively integrated into PyTorch and TensorFlow and integrated with your favorite AWS services
https://aws.amazon.com/machine-learning/neuron/
Other
468 stars 154 forks source link

Not able to convert Hugging Face fine-tuned BERT model into AWS Neuron #439

Closed vinayak-shanawad closed 2 years ago

vinayak-shanawad commented 2 years ago

Hi Team,

I have a fine-tuned BERT model which was trained using following libraries. torch == 1.8.1+cu111 transformers == 4.19.4

And not able to convert that fine-tuned BERT model into AWS neuron and getting following compilation errors. Could you please help me to resolve this issue?

Note: Trying to compile BERT model on SageMaker notebook instance and with "conda_python3" conda environment.

Installation:

Set Pip repository to point to the Neuron repository

!pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com

Install Neuron PyTorch - Note: Tried both options below.

"#!pip install torch-neuron==1.8.1.* neuron-cc[tensorflow] "protobuf<4" torchvision sagemaker>=2.79.0 transformers==4.17.0 --upgrade" !pip install --upgrade torch-neuron neuron-cc[tensorflow] "protobuf<4" torchvision


Model compilation:

import os
import tensorflow  # to workaround a protobuf version conflict issue
import torch
import torch.neuron
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_path = 'model/' # Model artifacts are stored in 'model/' directory

# load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForSequenceClassification.from_pretrained(model_path, torchscript=True)

# create dummy input for max length 128
dummy_input = "dummy input which will be padded later"
max_length = 128
embeddings = tokenizer(dummy_input, max_length=max_length, padding="max_length", truncation=True, return_tensors="pt")
neuron_inputs = tuple(embeddings.values())

# compile model with torch.neuron.trace and update config
model_neuron = torch.neuron.trace(model, neuron_inputs)
model.config.update({"traced_sequence_length": max_length})

# save tokenizer, neuron model and config for later use
save_dir="tmpd"
os.makedirs("tmpd",exist_ok=True)
model_neuron.save(os.path.join(save_dir,"neuron_model.pt"))
tokenizer.save_pretrained(save_dir)
model.config.save_pretrained(save_dir)

Model artifacts: We have got this model artifacts from multi-label topic classification model.

image


Error logs:

INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/ops/aten.py:2022: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:Neuron:Compiling function _NeuronGraph$698 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35'
INFO:Neuron:Compile command returned: -9
WARNING:Neuron:torch.neuron.trace failed on _NeuronGraph$698; falling back to native python function call
ERROR:Neuron:neuron-cc failed with the following command line call:
/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/convert.py", line 382, in op_converter
    item, inputs, compiler_workdir=sg_workdir, **kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/decorators.py", line 220, in trace
    'neuron-cc failed with the following command line call:\n{}'.format(command))
subprocess.SubprocessError: neuron-cc failed with the following command line call:
/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmpv4gg13ze/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpv4gg13ze/graph_def.neff --io-config '{"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]}' --verbose 35
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 0, percent compiled = 0.0%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 0 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 0.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 97 [supported]
INFO:Neuron: => aten::add: 39 [supported]
INFO:Neuron: => aten::contiguous: 12 [supported]
INFO:Neuron: => aten::div: 12 [supported]
INFO:Neuron: => aten::dropout: 38 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::gelu: 12 [supported]
INFO:Neuron: => aten::layer_norm: 25 [supported]
INFO:Neuron: => aten::linear: 74 [supported]
INFO:Neuron: => aten::matmul: 24 [supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::permute: 48 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::select: 1 [supported]
INFO:Neuron: => aten::size: 97 [supported]
INFO:Neuron: => aten::slice: 5 [supported]
INFO:Neuron: => aten::softmax: 12 [supported]
INFO:Neuron: => aten::tanh: 1 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::transpose: 12 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron: => aten::view: 48 [supported]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-97bba321d013> in <module>
     18 
     19 # compile model with torch.neuron.trace and update config
---> 20 model_neuron = torch.neuron.trace(model, neuron_inputs)
     21 model.config.update({"traced_sequence_length": max_length})
     22 

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/convert.py in trace(func, example_inputs, fallback, op_whitelist, minimum_segment_size, subgraph_builder_function, subgraph_inputs_pruning, skip_compiler, debug_must_trace, allow_no_ops_on_neuron, compiler_workdir, dynamic_batch_size, compiler_timeout, _neuron_trace, compiler_args, optimizations, verbose, **kwargs)
    182         logger.debug("skip_inference_context - trace with fallback at {}".format(get_file_and_line()))
    183         neuron_graph = cu.compile_fused_operators(neuron_graph, **compile_kwargs)
--> 184     cu.stats_post_compiler(neuron_graph)
    185 
    186     # Wrap the compiled version of the model in a script module. Note that this is

~/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/convert.py in stats_post_compiler(self, neuron_graph)
    491         if succesful_compilations == 0 and not self.allow_no_ops_on_neuron:
    492             raise RuntimeError(
--> 493                 "No operations were successfully partitioned and compiled to neuron for this model - aborting trace!")
    494 
    495         if percent_operations_compiled < 50.0:

RuntimeError: No operations were successfully partitioned and compiled to neuron for this model - aborting trace!

Thanks a lot.

hannanjgaws commented 2 years ago

Hi @Vinayaks117, The error message in your log shows Compile command returned: -9. This message typically indicates that the compiler process was killed. Normally this is due to the the OOM (out of memory) killer (run by the linux operating system) killing the compilation process due to memory exhaustion. The most recent version of torch-neuron should provide an updated message for -9 errors that reflects the typical cause for this failure mode.

We recommend you try compiling on an instance with more memory, such as an inf1.6xlarge. Note: you only need the larger instance for compilation; you can still use a smaller instance (such as an inf1.xlarge) to run inference.

Please let us know if compiling on a larger instance resolved the error you’re seeing.

vinayak-shanawad commented 2 years ago

Thanks for the advice @hannanjgaws it works.

But what I observed is there are lot of misclassifications from Neuron model as comparted to fine-tuned BERT model. Hence we can't productionalize that neuron model.

Any idea why there is difference in performance? I believe there might be issue in converting fine-tuned BERT model to AWS Neuron model.

Please check if you can help on this issue.

Conversion logs for your reference:

INFO:Neuron:There are 3 ops of 1 different types in the TorchScript that are not compiled by neuron-cc: aten::embedding, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-pytorch.md)
INFO:Neuron:Number of arithmetic operators (pre-compilation) before = 565, fused = 548, percent fused = 96.99%
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
WARNING:tensorflow:From /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/ops/aten.py:2022: where (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
INFO:Neuron:Compiling function _NeuronGraph$698 with neuron-cc
INFO:Neuron:Compiling with command line: '/home/ec2-user/anaconda3/envs/python3/bin/neuron-cc compile /tmp/tmp3o4t_86z/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmp3o4t_86z/graph_def.neff --io-config {"inputs": {"0:0": [[1, 128, 768], "float32"], "1:0": [[1, 1, 1, 128], "float32"]}, "outputs": ["Linear_5/aten_linear/Add:0"]} --verbose 35'
INFO:Neuron:skip_inference_context for tensorboard symbols at /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/tensorboard.py:305 tb_parse
INFO:Neuron:Number of neuron graph operations 1601 did not match traced graph 1323 - using heuristic matching of hierarchical information
INFO:Neuron:Number of arithmetic operators (post-compilation) before = 565, compiled = 548, percent compiled = 96.99%
INFO:Neuron:The neuron partitioner created 1 sub-graphs
INFO:Neuron:Neuron successfully compiled 1 sub-graphs, Total fused subgraphs = 1, Percent of model sub-graphs successfully compiled = 100.0%
INFO:Neuron:Compiled these operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 96
INFO:Neuron: => aten::add: 36
INFO:Neuron: => aten::contiguous: 12
INFO:Neuron: => aten::div: 12
INFO:Neuron: => aten::dropout: 38
INFO:Neuron: => aten::gelu: 12
INFO:Neuron: => aten::layer_norm: 25
INFO:Neuron: => aten::linear: 74
INFO:Neuron: => aten::matmul: 24
INFO:Neuron: => aten::permute: 48
INFO:Neuron: => aten::select: 1
INFO:Neuron: => aten::size: 96
INFO:Neuron: => aten::slice: 1
INFO:Neuron: => aten::softmax: 12
INFO:Neuron: => aten::tanh: 1
INFO:Neuron: => aten::transpose: 12
INFO:Neuron: => aten::view: 48
INFO:Neuron:Not compiled operators (and operator counts) to Neuron:
INFO:Neuron: => aten::Int: 1 [supported]
INFO:Neuron: => aten::add: 3 [supported]
INFO:Neuron: => aten::embedding: 3 [not supported]
INFO:Neuron: => aten::mul: 1 [supported]
INFO:Neuron: => aten::rsub: 1 [supported]
INFO:Neuron: => aten::size: 1 [supported]
INFO:Neuron: => aten::slice: 4 [supported]
INFO:Neuron: => aten::to: 1 [supported]
INFO:Neuron: => aten::unsqueeze: 2 [supported]
INFO:Neuron:skip_inference_context for tensorboard symbols at /home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/torch_neuron/tensorboard.py:305 tb_parse
INFO:Neuron:Number of neuron graph operations 61 did not match traced graph 105 - using heuristic matching of hierarchical information
hannanjgaws commented 2 years ago

Hi @Vinayaks117, to maximize numerical accuracy you can try using the --fast-math none compiler flag. If you find that this achieves your accuracy goals, you can tune the compilation options according to the documentation here: https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/appnotes/perf/mixed-precision.html.

If using these compiler flags doesn’t help, would it be possible for you to share your model with us, so that we can recreate the issue and debug it with you? (Feel free to share directly to aws-neuron-support@amazon.com, if that’s easier than posting here).

If sharing your model is not a possibility, can you point us to an open source model with a similar architecture to your model?

vinayak-shanawad commented 2 years ago

Hello @hannanjgaws

We tried using the "--fast-math none" compiler flag but there are still a lot of misclassification errors.

As requested I have shared the model artifacts via email. Please have a look. Thanks

hannanjgaws commented 2 years ago

Thank you for sending your model artifacts. We will take a look at reproducing the accuracy issues and will provide updates on this ticket.

vinayak-shanawad commented 2 years ago

Hi @hannanjgaws

Any updates please? Thanks

aws-taylor commented 2 years ago

Hello @Vinayaks117,

It appears that you are using conda, but installing packages via pip. This is known to cause version mismatch issues in some cases. I was able to successfully compile the model you sent us using the 'conda_mxnet_p37' kernel (Ignore the conda in the name, it is unused) in a SageMaker notebook. Below is the associated .ipynb file:


 {
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ae2702c8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fca94bab",
   "metadata": {},
   "outputs": [],
   "source": [
    "!{sys.executable} -m pip install \"torch-neuron==1.8.1.*\" \"neuron-cc[tensorflow]\" \"protobuf<4\" torchvision \"sagemaker>=2.79.0\" \"transformers==4.17.0\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b2be64b",
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import tensorflow  # to workaround a protobuf version conflict issue\n",
    "import torch\n",
    "import torch.neuron\n",
    "from transformers import AutoTokenizer, AutoModelForSequenceClassification\n",
    "\n",
    "model_path = 'model/' # Model artifacts are stored in 'model/' directory\n",
    "\n",
    "# load tokenizer and model\n",
    "tokenizer = AutoTokenizer.from_pretrained(model_path)\n",
    "model = AutoModelForSequenceClassification.from_pretrained(model_path, torchscript=True)\n",
    "\n",
    "# create dummy input for max length 128\n",
    "dummy_input = \"dummy input which will be padded later\"\n",
    "max_length = 128\n",
    "embeddings = tokenizer(dummy_input, max_length=max_length, padding=\"max_length\", truncation=True, return_tensors=\"pt\")\n",
    "neuron_inputs = tuple(embeddings.values())\n",
    "\n",
    "# compile model with torch.neuron.trace and update config\n",
    "model_neuron = torch.neuron.trace(model, neuron_inputs, compiler_workdir='.')\n",
    "model.config.update({\"traced_sequence_length\": max_length})\n",
    "\n",
    "# save tokenizer, neuron model and config for later use\n",
    "save_dir=\"tmpd\"\n",
    "os.makedirs(\"tmpd\",exist_ok=True)\n",
    "model_neuron.save(os.path.join(save_dir,\"neuron_model.pt\"))\n",
    "tokenizer.save_pretrained(save_dir)\n",
    "model.config.save_pretrained(save_dir)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e1645c70",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "conda_mxnet_p37",
   "language": "python",
   "name": "conda_mxnet_p37"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}