Compiling BiLSTM model not placing operations on Neuron runtime

jhu8 commented 4 years ago

We are attempting to use the Tensorflow Neuron Compilation API to compile a 2-layer BiLSTM model created using the TF Keras framework.

A summary of our model:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
glove_embedding (Embedding)  (None, 512, 100)          29013300  
_________________________________________________________________
bidirectional_lstm_1 (Bidire (None, 512, 256)          234496    
_________________________________________________________________
bidirectional_lstm_2 (Bidire (None, 128)               164352    
_________________________________________________________________
dense (Dense)                (None, 5)                 645       
=================================================================
Total params: 29,412,793
Trainable params: 399,493
Non-trainable params: 29,013,300

We saved this to SavedModel pb format. We're using Tensorflow version 1.15.0.
```
$ saved_model_cli show --dir model/ --all
```

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']: The given SavedModel SignatureDef contains the following input(s): inputs['input_1'] tensor_info: dtype: DT_FLOAT shape: (-1, 512) name: input_1_1:0 The given SavedModel SignatureDef contains the following output(s): outputs['dense'] tensor_info: dtype: DT_FLOAT shape: (-1, 5) name: dense_1/Sigmoid:0 Method name is: tensorflow/serving/predict


* We launched a DLAMI ec2 instance and activated the conda environment `aws_neuron_tensorflow_p36`
* Used script below following the example given here: https://github.com/aws/aws-neuron-sdk/blob/master/docs/tensorflow-neuron/api-compilation-python-api.md

import shutil import tensorflow.neuron as tfn

saved_model_path = "./model" compiled_saved_model_path = "./model_compiled" shutil.rmtree(compiled_saved_model_path, ignore_errors=True) tfn.saved_model.compile(saved_model_path, compiled_saved_model_path, compiler_args = ['--num-neuroncores', '4'])

* When running the script, we receive the following:

$ time python compile_model.py model 2020-09-02 19:29:54.402371: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA 2020-09-02 19:29:54.421722: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999995000 Hz 2020-09-02 19:29:54.421922: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55577701aea0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-09-02 19:29:54.421941: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-09-02 19:29:56.846324: I bazel-out/k8-opt/genfiles/tensorflow/neuron/convert/segment.cc:460] There are 144 ops of 17 different types in the graph that are not compiled by neuron-cc: TensorArrayGatherV3, TensorArrayScatterV3, TensorArrayReadV3, Range, Switch, NextIteration, TensorArrayV3, Placeholder, GatherV2, TensorArraySizeV3, NoOp, TensorArrayWriteV3, Enter, Merge, LoopCond, Exit, Identity, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md). WARNING:tensorflow:subgraph neuron_op_b284cfcca2a9ea1e, tensor bidirectional_lstm_1_1/forward_lstm_1/TensorArrayStack/TensorArrayGatherV30/_0:0: invalid shape (512, ?, 128) WARNING:tensorflow:Not fusing subgraph neuron_op_b284cfcca2a9ea1e: --io-config error WARNING:tensorflow:subgraph neuron_op_67cd9574e993f8d2, tensor bidirectional_lstm_1_1/backward_lstm_1/while/Switch_21/_4:0: invalid shape (?, 128) WARNING:tensorflow:Not fusing subgraph neuron_op_67cd9574e993f8d2: --io-config error WARNING:tensorflow:subgraph neuron_op_18c29f19d10a2911, tensor bidirectional_lstm_1_1/backward_lstm_1/while/mul0/_9:0: invalid shape (?, 128) WARNING:tensorflow:Not fusing subgraph neuron_op_18c29f19d10a2911: --io-config error WARNING:tensorflow:subgraph neuron_op_5cec332b4aff291a, tensor bidirectional_lstm_1_1/forward_lstm_1/while/Switch_21/_13:0: invalid shape (?, 128) WARNING:tensorflow:Not fusing subgraph neuron_op_5cec332b4aff291a: --io-config error WARNING:tensorflow:subgraph neuron_op_72cc1d6c3ad4dd82, tensor bidirectional_lstm_1_1/forward_lstm_1/while/mul0/_18:0: invalid shape (?, 128) WARNING:tensorflow:Not fusing subgraph neuron_op_72cc1d6c3ad4dd82: --io-config error WARNING:tensorflow:subgraph neuron_op_d626eff4f7d875e8, tensor bidirectional_lstm_2_1/forward_lstm_1_1/TensorArrayStack/TensorArrayGatherV30/_20:0: invalid shape (512, ?, 64) WARNING:tensorflow:Not fusing subgraph neuron_op_d626eff4f7d875e8: --io-config error WARNING:tensorflow:subgraph neuron_op_c3c05816e420cdff, tensor bidirectional_lstm_2_1/backward_lstm_1_1/while/Switch_21/_22:0: invalid shape (?, 64) WARNING:tensorflow:Not fusing subgraph neuron_op_c3c05816e420cdff: --io-config error WARNING:tensorflow:subgraph neuron_op_b76907057161855, tensor bidirectional_lstm_2_1/backward_lstm_1_1/while/mul0/_27:0: invalid shape (?, 64) WARNING:tensorflow:Not fusing subgraph neuron_op_b76907057161855: --io-config error WARNING:tensorflow:subgraph neuron_op_ee11a0edca0df4bf, tensor bidirectional_lstm_2_1/forward_lstm_1_1/while/Switch_21/_29:0: invalid shape (?, 64) WARNING:tensorflow:Not fusing subgraph neuron_op_ee11a0edca0df4bf: --io-config error WARNING:tensorflow:subgraph neuron_op_6b216a627be3bd43, tensor bidirectional_lstm_2_1/forward_lstm_1_1/while/mul0/_34:0: invalid shape (?, 64) WARNING:tensorflow:Not fusing subgraph neuron_op_6b216a627be3bd43: --io-config error INFO:tensorflow:fusing subgraph neuron_op_1d749170886285fa with neuron-cc WARNING:tensorflow:Failed to fuse subgraph neuron_op_1d749170886285fa with '/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpvhfknvh5/neuron_op_1d749170886285fa/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpvhfknvh5/neuron_op_1d749170886285fa/graph_def.neff --io-config "{\"inputs\": {\"glove_embedding_1/embedding_lookup0/_2:0\": [[1, 512, 100], \"float32\"]}, \"outputs\": [\"bidirectional_lstm_1_1/backward_lstm_1/strided_slice_1:0\", \"bidirectional_lstm_1_1/backward_lstm_1/ReverseV2:0\", \"bidirectional_lstm_1_1/backward_lstm_1/TensorArrayUnstack/strided_slice:0\"]}"' INFO:tensorflow:fusing subgraph neuron_op_769df76c0cc6e189 with neuron-cc WARNING:tensorflow:Failed to fuse subgraph neuron_op_769df76c0cc6e189 with '/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpvhfknvh5/neuron_op_769df76c0cc6e189/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpvhfknvh5/neuron_op_769df76c0cc6e189/graph_def.neff --io-config "{\"inputs\": {\"glove_embedding_1/embedding_lookup0/_3:0\": [[1, 512, 100], \"float32\"]}, \"outputs\": [\"bidirectional_lstm_1_1/backward_lstm_1/zeros:0\", \"bidirectional_lstm_1_1/backward_lstm_1/zeros_1:0\"]}"' INFO:tensorflow:fusing subgraph neuron_op_cda3de985787ea8f with neuron-cc WARNING:tensorflow:Failed to fuse subgraph neuron_op_cda3de985787ea8f with '/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpvhfknvh5/neuron_op_cda3de985787ea8f/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpvhfknvh5/neuron_op_cda3de985787ea8f/graph_def.neff --io-config "{\"inputs\": {\"glove_embedding_1/embedding_lookup0/_11:0\": [[1, 512, 100], \"float32\"]}, \"outputs\": [\"bidirectional_lstm_1_1/forward_lstm_1/transpose:0\", \"bidirectional_lstm_1_1/forward_lstm_1/strided_slice_1:0\", \"bidirectional_lstm_1_1/forward_lstm_1/TensorArrayUnstack/strided_slice:0\"]}"' INFO:tensorflow:fusing subgraph neuron_op_8da53c43384dad35 with neuron-cc WARNING:tensorflow:Failed to fuse subgraph neuron_op_8da53c43384dad35 with '/home/ec2-user/anaconda3/envs/aws_neuron_tensorflow_p36/bin/neuron-cc compile /tmp/tmpvhfknvh5/neuron_op_8da53c43384dad35/graph_def.pb --framework TENSORFLOW --pipeline compile SaveTemps --output /tmp/tmpvhfknvh5/neuron_op_8da53c43384dad35/graph_def.neff --io-config "{\"inputs\": {\"glove_embedding_1/embedding_lookup0/_12:0\": [[1, 512, 100], \"float32\"]}, \"outputs\": [\"bidirectional_lstm_1_1/forward_lstm_1/zeros:0\", \"bidirectional_lstm_1_1/forward_lstm_1/zeros_1:0\"]}"' INFO:tensorflow:Number of operations in TensorFlow session: 4920 INFO:tensorflow:Number of operations after tf.neuron optimizations: 376 INFO:tensorflow:Number of operations placed on Neuron runtime: 0 WARNING:tensorflow:Converted ./model to ./model_compiled but no operator will be running on AWS machine learning accelerators. This is probably not what you want. Please refer to https://github.com/aws/aws-neuron-sdk for current limitations of the AWS Neuron SDK. We are actively improving (and hiring)!

real 0m21.427s user 0m18.269s sys 0m3.710s



Is there anything we're doing wrong here, or is this an issue with the Neuron SDK?

jeffhataws commented 4 years ago

Thank you for the detailed instructions. We are taking a look.

jeffhataws commented 4 years ago

I have reproduced the issue and am researching a solution.

jeffhataws commented 4 years ago

To enable compilation to Inferentia of Keras Bidirection LSTM, first ensure that “unroll” option to LSTM is set to True to remove flow-control operations, and implementation=2 is used, like the example below (save script to file "extract.py" and run "python extract.py"):

import shutil
import tensorflow as tf
import tensorflow.keras as keras
from tensorflow.keras import layers

max_features = 290133
embed_dims = 100
hidden_size = 128
maxlen = 128

inputs = keras.Input(shape=(maxlen,), dtype='int32')
x = layers.Embedding(max_features, embed_dims)(inputs)
x = layers.Bidirectional(layers.LSTM(hidden_size, return_sequences=True, unroll=True, implementation=2))(x)
x = layers.Bidirectional(layers.LSTM(hidden_size//2, unroll=True, implementation=2))(x)
outputs = layers.Dense(1, activation="sigmoid")(x)
model = keras.Model(inputs, outputs)
model.summary()

(x_train, y_train), (x_val, y_val) = keras.datasets.imdb.load_data(num_words=max_features)
x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=maxlen)
x_val = keras.preprocessing.sequence.pad_sequences(x_val, maxlen=maxlen)
model.compile("adam", "binary_crossentropy", metrics=['accuracy'])
#model.fit(x_train, y_train, batch_size=1, epochs=1, validation_data=(x_val, y_val), steps_per_epoch=1)

from tensorflow.tools.graph_transforms import TransformGraph
from tensorflow.compat.v1.graph_util import convert_variables_to_constants
from tensorflow.compat.v1.graph_util import remove_training_nodes

saved_model_dir = './bilstm_saved_model'
shutil.rmtree(saved_model_dir, ignore_errors=True)
with keras.backend.get_session() as sess:
    # optimize for inference
    input_name = model.input.name.replace(':0', '')
    output_name = model.output.name.replace(':0', '')
    graph_def = convert_variables_to_constants(sess, sess.graph.as_graph_def(), [output_name])
    graph_def = remove_training_nodes(graph_def, protected_nodes=[output_name])
    graph_def = TransformGraph (graph_def, [input_name], [output_name],
         ['add_default_attributes', 'remove_nodes(op=Identity, op=CheckNumerics)', 
          'fold_constants(ignore_errors=true)', 'strip_unused_nodes'])
    # save as saved model
    tf.import_graph_def(graph_def, name='')
    inputs = {model.input.name: sess.graph.get_tensor_by_name(model.input.name)}
    outputs = {model.output.name: sess.graph.get_tensor_by_name(model.output.name)}
    tf.saved_model.simple_save(sess, saved_model_dir, inputs, outputs)

The example above also show optimization of the graph for inference to make it easier to compile.

The compilation code below is based on what you have except that num-neuroncores is set to 1 and a fix shape is passed via model_shape_feed_dict (save script to file "compile.py" and run "python compile.py"):

import shutil
import tensorflow.neuron as tfn

saved_model_path = "./bilstm_saved_model"
compiled_saved_model_path = "./bilstm_compiled_saved_model"
shutil.rmtree(compiled_saved_model_path, ignore_errors=True)
tfn.saved_model.compile(saved_model_path, compiled_saved_model_path, 
               compiler_args = ['--num-neuroncores', '1'],
               model_shape_feed_dict={'input_1:0' : [1, 128]},
               compiler_workdir='compiler_workdir')

I have checked that extraction and compilation of sequence length 128 (100 embed dims, 128 hidden size) are successful on p3.2xlarge. Extraction takes about 1 minute and compilation takes about 20 minutes. I am running experiments to check higher sequence lengths.

wjzhan commented 4 years ago

thanks a lot. We'll test out resetting the model with unroll=True. Seems default Implementation is already 2 so we'll leave it as it is. tensorflow-lstm

jhu8 commented 4 years ago

We tried compilation with an updated model where unroll=True, and saved as saved model like it was done in the extract.py script. It successfully compiled on a p3.2xlarge using the compile.py script provided with compiler_args = ['--num-neuroncores', '4'] and printing out the compilation result. Compilation took about 50 minutes. It appears that very few operations get placed on Neuron runtime.


 \"bidirectional_lstm_1/forward_lstm_1/unstack428/_1003:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack474/_1004:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack344/_1005:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack433/_1006:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack282/_1007:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack511/_1008:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack89/_1009:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack429/_1010:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack2/_1011:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack343/_1012:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack432/_1013:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack127/_1014:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack388/_1015:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack431/_1016:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack15/_1017:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack126/_1018:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack146/_1019:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack255/_1020:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack387/_1021:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack321/_1022:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack407/_1023:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/backward_lstm_1/unstack171/_1024:0\": [[1, 100], \"float32\"], \"bidirectional_lstm_1/forward_lstm_1/unstack16/_1025:0\": [[1, 100], \"float32\"]}, \"outputs\": [\"bidirectional_lstm_1/concat:0\", \"bidirectional_lstm_2/forward_lstm_1_1/transpose:0\", \"bidirectional_lstm_2/backward_lstm_1_1/transpose:0\"]}" --num-neuroncores 4'
INFO:tensorflow:Number of operations in TensorFlow session: 180366
INFO:tensorflow:Number of operations after tf.neuron optimizations: 30817
INFO:tensorflow:Number of operations placed on Neuron runtime: 4
INFO:tensorflow:No assets to save.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: ./bilstm_compiled_saved_model_4_neuroncores_1/saved_model.pb
WARNING:tensorflow:Converted ./bilstm_saved_model to ./bilstm_compiled_saved_model_4_neuroncores_1 but only a small portion of operators will be running on AWS machine learning accelerators. This is probably not what you want (well, unless there are too many training operators in your SavedModel). Please refer to https://github.com/aws/aws-neuron-sdk for current limitations of the AWS Neuron SDK. We are actively improving (and hiring)!
{'OnNeuronRatio': 0.00012979848784761659}

aws-renfu commented 4 years ago

Thank you for the feedback. We are actively looking at this issue.

jhu8 commented 4 years ago

Hi, are there any updates on this if available? Thanks.

aws-diamant commented 4 years ago

Hi jhu8, Thanks for checking in.

We identified a couple of issues in this case.

The first issue, as you reported, is the long compile-time. Using implementation=2 [1] as @jeffhataws suggested above cuts compile-time by ~2x, and moving to the latest (9/21) release also helps some. We’re looking into improving it further.

The second issue, which you also reported, is that very few operators are placed on the Neuron device. This is caused by the choice of seqlen=512, and we see significantly better placement for seqlen<=192. We are working on extending LSTM seqlen support.

The third issue is the number of neuron-cores. I suggest to compile to a single neuron-core only (rather than 4, see @jeffhataws's example), and then use data-parallel mode during inference. This should also cut compile-time and enable better placement.

Finally, we also identified that the neuron-compiler optimizes-away one of the inputs, which is causing a dimension mismatch error during inference. We're fixing this.

We are working on fixing these for an upcoming release, and will keep you updated.

Thanks, Ron

[1] Note: Keras LSTM actually uses implementation=1 as the default, as opposed to LSTM_v2 which uses implementation=2 as default (see here: https://github.com/tensorflow/tensorflow/blob/v2.3.0/tensorflow/python/keras/layers/recurrent.py).

jhu8 commented 4 years ago

Thank you for the update!

aws-taylor commented 4 years ago

Hello @jhu8,

We are actively debugging and fixing these issues; we will continue to keep you updated.

Regards, Taylor

jhu8 commented 4 years ago

Hello, checking in on this, have there been any updates? Thanks!

awsrjh commented 4 years ago

Hi jhu8 - we are still working on this issue . We will let you know when we have it fixed in an upcoming release. thanks for the patience.

mrnikwaws commented 3 years ago

With v1.10.0 release, bidirectional LSTM model can be compiled up to sequence length 256. We are still working on sequence length 512 and will update this ticket when it is supported.

aws-joshim commented 3 years ago

Hi jhu8 - The bidirectional LSTM model can be compiled up to sequence length 256 with the latest v1.11.0 Neuron SDK . We continue to work on compilation for larger sequence lengths. We will keep this ticket updated with the progress on BilSTM model improvements.

jhu8 commented 3 years ago

Closing this issue, we no longer need the increased sequence length support. Thank you for all your help.

aws-neuron / aws-neuron-sdk

Compiling BiLSTM model not placing operations on Neuron runtime #161