[luci-micro] Speedup GRU/LSTM operations

Samsung / ONE

On-device Neural Engine

Other

427 stars 152 forks source link

[luci-micro] Speedup GRU/LSTM operations #9225

Open binarman opened 2 years ago

binarman commented 2 years ago

What

Need to implement optimized kernels for GRU and LSTM operations in interpreter for MCU

Why

For now these operations are not optimized, works slow and consumes a lot of memory

binarman commented 2 years ago

+cc @SlavikMIPT, @ai-moiseev

binarman commented 2 years ago

@ai-moiseev Could you schedule model provided by @SlavikMIPT using https://github.com/Samsung/ONE/tree/master/compiler/circle-execution-plan and vs-code visualizer?

BalyshevArtem commented 2 years ago

@SlavikMIPT made GRU model: model_gru.circle.zip

I made model with UndirectionalSequenceLSTM: model_lstm.circle.zip

binarman commented 2 years ago

my models with related python scripts: rnn_examples.zip

based on this colab from this tutorial

binarman commented 2 years ago

Current support status

Int8

keras2tflite conversion	GRU	LSTM
unroll	#9253
fused operation	@SlavikMIPT: kernel implemented and tested, can not save GRU in circle*

* tflite and circle schemas do not have separate GRU opcode. ways to support GRU:

Use Flex operations: tf documentation
- need generate model with flex operation
Implement separate opcode in circle and make fusing pass

float32

keras2tflite conversion	GRU	LSTM
unroll	@BalyshevArtem in progress - #9253
fused operation		@BalyshevArtem in progress - #9263 (+hybrid support)

binarman commented 2 years ago

Update to this https://github.com/Samsung/ONE/issues/9225#issuecomment-1148900583

I've patched tensorflow (r2.9.0) and got "Flex" operation: gru_cell.tar.zip

This operation could be run using TFLite Flex delegate.

What I did with TF is described in this instruction.

seanshpark commented 2 years ago

@chunseoklee , PTAL at above GRU model for OneRT.

chunseoklee commented 2 years ago

@binarman IMHO, We need to make sure that our target model will be generated in the way we have done. Have you heard anything about this ?

chunseoklee commented 2 years ago

@chunseoklee , PTAL at above GRU model for OneRT.

If this op is passed as custom op, ONERT can process it by implementing it(not that hard). But, at a glance, this flex operation is not custom op. I will take a look.

binarman commented 2 years ago

@chunseoklee

IMHO, We need to make sure that our target model will be generated in the way we have done. Have you heard anything about this ?

Not yet, maybe in near future I'll get some information related to this topic...

binarman commented 2 years ago

But, at a glance, this flex operation is not custom op

this flex op is a custom op, but it is "special" kind of custom op, that is supported by Flex delegate in TFLite: https://www.tensorflow.org/lite/guide/ops_select