Samsung / ONE

On-device Neural Engine
Other
437 stars 158 forks source link

[CFE] Unroll UnidirectionalSequenceLSTM for Trix #9940

Open seanshpark opened 2 years ago

seanshpark commented 2 years ago

Let's provide unrolling UnidirectionalSequenceLSTM Op for compilation in Trix.


To do items to support unroll in one-import-tf / one-import-onnx

seanshpark commented 2 years ago

related #9895


one-cmds test material

UnidirSeqLSTM.zip

seanshpark commented 2 years ago

one-import-tf or one-import-tflite can pass this flag to circle2circle

How?

@mhs4670go , any comments about how-to ?

mhs4670go commented 2 years ago

For others' information, I summarize the offline talk with @seanshpark.

one-import-onnx has following workflow when it comes to processing LSTM operator.

.onnx -> (onnx_legalizer) -> .onnx -> (onnx_tf) -> ..

Current legalizer only support .onnx inputs and accordingly only one-import-onnx has an option that unrolls LSTM operators.

And now, this issue is trying to make one-import-tf* have the same functionality.

We could introduce the option that unrolls LSTM op into circle2circle, which means that option would be introduced at one-optimize. But, when it comes to unified interface with existing one-import-onnx, one-import-tf* not having legalizing option seems not natural.

So, here is the thing.

  1. Introduce the option into one-import-tf* but strictly speaking, that option does nothing. It's just for aligning the interface with one-import-onnx.
  2. Introduce the option into circle2circle(one-optimize) and turn it on by default. It always unrolls the LSTM operator. Then, both workflows'd result in the same meaning;unrolling the LSTM op.
    • one-import-onnx -> one-optimize
    • one-import-tf* -> one-optimize
seanshpark commented 2 years ago

How to validate?

seanshpark commented 2 years ago

Unroll pattern?

(1) image

(2) image

(3) image

(4) image

seanshpark commented 2 years ago

Last dimension value 2 is cause of 2 given in first argument units=2

seanshpark commented 2 years ago

smallest I/O shape and simplest form of model

model = keras.Sequential()
shape = (1, 1)

model.add(keras.layers.InputLayer(input_shape=shape, batch_size=1))
model.add(keras.layers.LSTM(1, return_sequences=True))

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False

tflite_model = converter.convert()
with open("lstm_1x1x1.tflite", "wb") as f:
    f.write(tflite_model)

and unrolled

model = keras.Sequential()
shape = (1, 1)

model.add(keras.layers.InputLayer(input_shape=shape, batch_size=1))
model.add(keras.layers.LSTM(1, return_sequences=True, unroll=True))

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False

tflite_model = converter.convert()
with open("lstm_unroll_1x1x1.tflite", "wb") as f:
    f.write(tflite_model)
seanshpark commented 2 years ago

Things to solve/do

Working draft and test prepared

seanshpark commented 2 years ago

How to map?

from simplest model

Operator Codes: [order] OpCodeName (OpCode Enum)
[0] UNIDIRECTIONAL_SEQUENCE_LSTM (code: 44, dep_code: 44, version: 1)

Buffers: B(index) (length) values, if any
B(0) (0)
B(1) (0)
B(2) (4) 00 00 00 00
B(3) (4) 0x6c 0xe4 0x75 0xbf
B(4) (4) 0x45 0x24 0x3b 0xbe
B(5) (4) 0x10 0x42 0xeb 0x3c
B(6) (4) 0x20 0xcb 0x54 0x3e
B(7) (4) 00 00 00 00
B(8) (4) 00 00 0x80 0x3f
B(9) (4) 0xa4 0xba 0xf3 0xbe
B(10) (4) 0xf8 0x95 0x1c 0x3e
B(11) (4) 0xf8 0xb9 0x9d 0x3e
B(12) (4) 0x70 0x7b 0xa8 0x3e
B(13) (4) 00 00 00 00
B(14) (0)
B(15) (16) 0x31 0x2e 0x31 0x33 0x2e 0x31 00 00 00 00 00 00 00 00 00 00 ...
B(16) (88) 0xc 00 00 00 0x8 00 0xc 00 0x8 00 0x4 00 0x8 00 00 00 ...

T(0:0) FLOAT32 (1, 1, 1) B(1) serving_default_input_16:0
T(0:1) FLOAT32 (1, 1) B(0) (variable) sequential_15/lstm_15/zeros
T(0:2) FLOAT32 (1, 1) B(3) arith.constant
T(0:3) FLOAT32 (1, 1) B(4) arith.constant1
T(0:4) FLOAT32 (1, 1) B(5) arith.constant2
T(0:5) FLOAT32 (1, 1) B(6) arith.constant3
T(0:6) FLOAT32 (1) B(7) arith.constant4
T(0:7) FLOAT32 (1) B(8) arith.constant5
T(0:8) FLOAT32 (1, 1) B(9) arith.constant6
T(0:9) FLOAT32 (1, 1) B(10) arith.constant7
T(0:10) FLOAT32 (1, 1) B(11) arith.constant8
T(0:11) FLOAT32 (1, 1) B(12) arith.constant9
T(0:12) FLOAT32 (1, 1) B(0) (variable) sequential_15/lstm_15/zeros1
T(0:13) FLOAT32 (1, 1, 1) B(14) StatefulPartitionedCall:0

O(0:0) UNIDIRECTIONAL_SEQUENCE_LSTM
    Activation(TANH) cell_clip(10) proj_clip(0) time_major(0) asymmetric_quantize_inputs(0)
    I T(0:0) serving_default_input_16:0
    I T(0:11) arith.constant9
    I T(0:10) arith.constant8
    I T(0:9) arith.constant7
    I T(0:8) arith.constant6
    I T(0:5) arith.constant3
    I T(0:4) arith.constant2
    I T(0:3) arith.constant1
    I T(0:2) arith.constant
    I T(0:-1)
    I T(0:-1)
    I T(0:-1)
    I T(0:6) arith.constant4
    I T(0:7) arith.constant5
    I T(0:6) arith.constant4
    I T(0:6) arith.constant4
    I T(0:-1)
    I T(0:-1)
    I T(0:1) sequential_15/lstm_15/zeros
    I T(0:12) sequential_15/lstm_15/zeros1
    I T(0:-1)
    I T(0:-1)
    I T(0:-1)
    I T(0:-1)
    O T(0:13) StatefulPartitionedCall:0
seanshpark commented 2 years ago

FC(2) weight is

[ input_input_weights, input_forget_weights, input_cell_weights, input_output_weights ]

where 4 x [1, 4] becomes [4, 4] and bias

[ input_gate_bias, forget_gate_bias, cell_gate_bias, output_gate_bias ]

from 4 x [1] becomes [4]

FC(3) left is

[ recurrent_input_weights, recurrent_forget_weights, recurrent_cell_weights, recurrent_output_weights ]

where 4 x [1, 1] becomes [4, 1] without bias.

FC(3) right is same as FC[2] without bias, where the bias for this sub network goes to second Add

seanshpark commented 2 years ago

How the mapping was described as above?

import tensorflow as tf
from tensorflow import keras

shape = (2, 4)
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape=shape, batch_size=3))
model.add(keras.layers.LSTM(1, input_shape=shape, return_sequences=True))
model.save("out/lstm_3x2x4.h5")
model.save_weights("out/lstm_3x2x4_w.h5")

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False
tflite_model = converter.convert()
with open("out/lstm_3x2x4.tflite", "wb") as f:
    f.write(tflite_model)

shape = (2, 4)
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape=shape, batch_size=3))
model.add(keras.layers.LSTM(1, input_shape=shape, return_sequences=True, unroll=True))
model.load_weights('out/lstm_3x2x4_w.h5')
model.save('out/lstm_3x2x4_unroll.h5')

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False
tflite_model = converter.convert()
with open('out/lstm_3x2x4_unroll.tflite', 'wb') as f:
    f.write(tflite_model)
seanshpark commented 1 year ago

to pass unroll_unidirseqlstm option from one-cmds-tf* to tflite2circle to circle2circle ... ... there is no existing path for this