Open StuartIanNaylor opened 2 years ago
@StuartIanNaylor
In my case , Full-integer quantization for this double stacked LSTM model is not available. the calculation inside the model still remains float32, when I (dynamic) quantized this model
from tensorflow official documentation full-int quantization(static quantization) for LSTM not available.
check below issue also https://github.com/tensorflow/tensorflow/issues/25563
I think research on fully quantize LSTM model is still under construction
hope this can give you some help : )
That is a massive help and many thanks for the bad news as will save much wasted time.
Damn! :( There are a lot of frameworks such as ArmNN to Npus's that can not run it then unless cpu. I will leave it open so people can see your great info. Many Thanks
@StuartIanNaylor Glad you got my little help. It's too bad that It cannot be fully-quantized for ArmNN or microcontrollers But in my opinion, it already satifies realtime performance on CPU (worst case maybe ? ) Which makes no need for fully-quantized model, if your hardware has CPU. amazing acheivement breizhn has made.
Its no criticism of what breizhn produced just the realisation of even better and further optimisation whilst also dropping the python for a DSP more performant C/Rust environ, could achieve. There are so many devices now with Mali GPU's that with ArmNN quant could of run maybe and same of embedded NPU's. This lies with ML frameworks especially Tensorflow or maybe Onnx and why recurrent metworks such as LSTM or GRU is so problematic is out of the scope of my knowledge level but I can appreciate the limitations.
A few months ago I managed to quantize this LSTM model and run it on a Coral Edge TPU https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb
The example has been broken since TF 2.7...
A few months ago I managed to quantize this LSTM model and run it on a Coral Edge TPU https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb
The example has been broken since TF 2.7...
Yeah its confusing as post-training quantization of recurrent layers does seem to be broken, dunno.
If I may ask, how do you plan to convert this model to integer only? 1) The mask produced by the model ranges from 0 to 1. Is it possible to train the integer-only model to produce mask ranges from 0 to 255 ? 2) If the states are changed to integer only, it would affect the LSTM's/RNN's performance. So how do you keep the difference minimal?
You map the 0 to -128 and the 1 to 127, and all the intermediate values are then quantized into that interval.
If the network is tightly fitted , quantization can destroy the network performance, and you need to use alternative methods that only a few experimental/research frameworks support...
In other cases, the network will just output a similar output with some extra error.
I think that Quantization Aware Training with RNN is still in an experimental phase, but you can check out QKeras, a QAT library that partially supports this type of training for RNN.
I noticed that the value of states keep going up as the model is processing the audio, so how should I quantize it within the int8 limit?
@st4 use the attached dtln quantized tflite model https://github.com/nyadla-sys/whisper.tflite/blob/main/models/dtln_quantized.tflite
https://github.com/heisenberg-kim/lstm_in_the_unet
quantization model from dtln
Nils is it possible to create an integer only models so this could run on accelerators or frameworks such as ArmNN? https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization
I always get confused at how to implement the representative_dataset()?
Has anyone done this and got an example or even better the tflite models?