[CFE] Unroll UnidirectionalSequenceLSTM for Trix

seanshpark commented 2 years ago

Let's provide unrolling UnidirectionalSequenceLSTM Op for compilation in Trix.

to match how LSTM op unrolling in ONNX, we may provide this option in one-import-tf or one-import-tflite
as for UnidirectionalSequenceLSTM, it will be better to unroll in circle2circle, technically
one-import-tf or one-import-tflite can pass this flag to circle2circle

To do items to support unroll in one-import-tf / one-import-onnx

[x] land UnrollPass in luci/pass
[x] add option to circle2circle
[ ] add option to one-cmds one-optimizer
[ ] add option to one-cmds one-import-tf / one-import-tflite
[ ] add tests in one-cmds

seanshpark commented 2 years ago

related #9895

one-cmds test material

UnidirSeqLSTM.zip

seanshpark commented 2 years ago

one-import-tf or one-import-tflite can pass this flag to circle2circle

How?

@mhs4670go , any comments about how-to ?

mhs4670go commented 2 years ago

For others' information, I summarize the offline talk with @seanshpark.

one-import-onnx has following workflow when it comes to processing LSTM operator.

.onnx -> (onnx_legalizer) -> .onnx -> (onnx_tf) -> ..

Current legalizer only support .onnx inputs and accordingly only one-import-onnx has an option that unrolls LSTM operators.

And now, this issue is trying to make one-import-tf* have the same functionality.

We could introduce the option that unrolls LSTM op into circle2circle, which means that option would be introduced at one-optimize. But, when it comes to unified interface with existing one-import-onnx, one-import-tf* not having legalizing option seems not natural.

So, here is the thing.

Introduce the option into one-import-tf* but strictly speaking, that option does nothing. It's just for aligning the interface with one-import-onnx.
Introduce the option into circle2circle(one-optimize) and turn it on by default. It always unrolls the LSTM operator. Then, both workflows'd result in the same meaning;unrolling the LSTM op.
- one-import-onnx -> one-optimize
- one-import-tf* -> one-optimize

seanshpark commented 2 years ago

How to validate?

use tflite model in https://github.com/Samsung/ONE/issues/9895#issuecomment-1289768739
get input-output values
unroll in circle2circle
get input-output values of circle
compare tflite (with LSTM) vs circle (unrolled)

seanshpark commented 2 years ago

Unroll pattern?

assume input is [C,X,Y]
transpose input with perm [1,0,2], iput as [X,C,Y] -- (1)
unpack with axis 0, number of X -- (1)
decompose to sub network from FC ~ Mul -- (2)
first sub network output of Mul goes to second sub networkwith FC + Add -- (3)
all the outputs of sub network is packed to [X',C,Y'] -- (4)
transpose output with perm [1,0,2] output as [C,X',Y'] -- (4)
not sure X == X'

(1)

(2)

(3)

(4)

seanshpark commented 2 years ago

Last dimension value 2 is cause of 2 given in first argument units=2

seanshpark commented 2 years ago

smallest I/O shape and simplest form of model

model = keras.Sequential()
shape = (1, 1)

model.add(keras.layers.InputLayer(input_shape=shape, batch_size=1))
model.add(keras.layers.LSTM(1, return_sequences=True))

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False

tflite_model = converter.convert()
with open("lstm_1x1x1.tflite", "wb") as f:
    f.write(tflite_model)

and unrolled

model = keras.Sequential()
shape = (1, 1)

model.add(keras.layers.InputLayer(input_shape=shape, batch_size=1))
model.add(keras.layers.LSTM(1, return_sequences=True, unroll=True))

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False

tflite_model = converter.convert()
with open("lstm_unroll_1x1x1.tflite", "wb") as f:
    f.write(tflite_model)

seanshpark commented 2 years ago

Things to solve/do

[x] how is inputs of LSTM mapped to decomposed Op inputs?
[x] save transformed model to file to view interim model
- use circle2circle-dredd-recipe-test
[x] unroll 1x1x1 LSTM correctly and do value testing
- use UnidirectionalSequenceLSTM_003
[x] unroll 2x3x4 LSTM

Working draft and test prepared

seanshpark commented 2 years ago

How to map?

from simplest model

Operator Codes: [order] OpCodeName (OpCode Enum)
[0] UNIDIRECTIONAL_SEQUENCE_LSTM (code: 44, dep_code: 44, version: 1)

Buffers: B(index) (length) values, if any
B(0) (0)
B(1) (0)
B(2) (4) 00 00 00 00
B(3) (4) 0x6c 0xe4 0x75 0xbf
B(4) (4) 0x45 0x24 0x3b 0xbe
B(5) (4) 0x10 0x42 0xeb 0x3c
B(6) (4) 0x20 0xcb 0x54 0x3e
B(7) (4) 00 00 00 00
B(8) (4) 00 00 0x80 0x3f
B(9) (4) 0xa4 0xba 0xf3 0xbe
B(10) (4) 0xf8 0x95 0x1c 0x3e
B(11) (4) 0xf8 0xb9 0x9d 0x3e
B(12) (4) 0x70 0x7b 0xa8 0x3e
B(13) (4) 00 00 00 00
B(14) (0)
B(15) (16) 0x31 0x2e 0x31 0x33 0x2e 0x31 00 00 00 00 00 00 00 00 00 00 ...
B(16) (88) 0xc 00 00 00 0x8 00 0xc 00 0x8 00 0x4 00 0x8 00 00 00 ...

T(0:0) FLOAT32 (1, 1, 1) B(1) serving_default_input_16:0
T(0:1) FLOAT32 (1, 1) B(0) (variable) sequential_15/lstm_15/zeros
T(0:2) FLOAT32 (1, 1) B(3) arith.constant
T(0:3) FLOAT32 (1, 1) B(4) arith.constant1
T(0:4) FLOAT32 (1, 1) B(5) arith.constant2
T(0:5) FLOAT32 (1, 1) B(6) arith.constant3
T(0:6) FLOAT32 (1) B(7) arith.constant4
T(0:7) FLOAT32 (1) B(8) arith.constant5
T(0:8) FLOAT32 (1, 1) B(9) arith.constant6
T(0:9) FLOAT32 (1, 1) B(10) arith.constant7
T(0:10) FLOAT32 (1, 1) B(11) arith.constant8
T(0:11) FLOAT32 (1, 1) B(12) arith.constant9
T(0:12) FLOAT32 (1, 1) B(0) (variable) sequential_15/lstm_15/zeros1
T(0:13) FLOAT32 (1, 1, 1) B(14) StatefulPartitionedCall:0

O(0:0) UNIDIRECTIONAL_SEQUENCE_LSTM
    Activation(TANH) cell_clip(10) proj_clip(0) time_major(0) asymmetric_quantize_inputs(0)
    I T(0:0) serving_default_input_16:0
    I T(0:11) arith.constant9
    I T(0:10) arith.constant8
    I T(0:9) arith.constant7
    I T(0:8) arith.constant6
    I T(0:5) arith.constant3
    I T(0:4) arith.constant2
    I T(0:3) arith.constant1
    I T(0:2) arith.constant
    I T(0:-1)
    I T(0:-1)
    I T(0:-1)
    I T(0:6) arith.constant4
    I T(0:7) arith.constant5
    I T(0:6) arith.constant4
    I T(0:6) arith.constant4
    I T(0:-1)
    I T(0:-1)
    I T(0:1) sequential_15/lstm_15/zeros
    I T(0:12) sequential_15/lstm_15/zeros1
    I T(0:-1)
    I T(0:-1)
    I T(0:-1)
    I T(0:-1)
    O T(0:13) StatefulPartitionedCall:0

seanshpark commented 2 years ago

FC(2) weight is

[ input_input_weights, input_forget_weights, input_cell_weights, input_output_weights ]

where 4 x [1, 4] becomes [4, 4] and bias

[ input_gate_bias, forget_gate_bias, cell_gate_bias, output_gate_bias ]

from 4 x [1] becomes [4]

FC(3) left is

[ recurrent_input_weights, recurrent_forget_weights, recurrent_cell_weights, recurrent_output_weights ]

where 4 x [1, 1] becomes [4, 1] without bias.

FC(3) right is same as FC[2] without bias, where the bias for this sub network goes to second Add

seanshpark commented 2 years ago

How the mapping was described as above?

definition of LSTM in general like https://en.wikipedia.org/wiki/Long_short-term_memory
https://github.com/tensorflow/tensorflow/blob/4f8c1cef83eab2332e4adff7f67d54c3049d97f8/tensorflow/compiler/mlir/lite/utils/lstm_utils.cc#L634 ConvertKerasLSTMLayer()
- this method looks in to the body of WHILE of TF LSTM and creates inputs of tflite LSTM op
- looking it opposite is for our case
a simple test program to generate tflite files and view them with netron

import tensorflow as tf
from tensorflow import keras

shape = (2, 4)
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape=shape, batch_size=3))
model.add(keras.layers.LSTM(1, input_shape=shape, return_sequences=True))
model.save("out/lstm_3x2x4.h5")
model.save_weights("out/lstm_3x2x4_w.h5")

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False
tflite_model = converter.convert()
with open("out/lstm_3x2x4.tflite", "wb") as f:
    f.write(tflite_model)

shape = (2, 4)
model = keras.Sequential()
model.add(keras.layers.InputLayer(input_shape=shape, batch_size=3))
model.add(keras.layers.LSTM(1, input_shape=shape, return_sequences=True, unroll=True))
model.load_weights('out/lstm_3x2x4_w.h5')
model.save('out/lstm_3x2x4_unroll.h5')

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.allow_custom_ops = True
converter.experimental_new_converter = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]
converter._experimental_lower_tensor_list_ops = False
tflite_model = converter.convert()
with open('out/lstm_3x2x4_unroll.tflite', 'wb') as f:
    f.write(tflite_model)

seanshpark commented 1 year ago

to pass unroll_unidirseqlstm option from one-cmds-tf* to tflite2circle to circle2circle ... ... there is no existing path for this

Samsung / ONE

[CFE] Unroll UnidirectionalSequenceLSTM for Trix #9940