Creating integer only models

breizhn / DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.

MIT License

586 stars 161 forks source link

Creating integer only models #66

Open StuartIanNaylor opened 2 years ago

StuartIanNaylor commented 2 years ago

Nils is it possible to create an integer only models so this could run on accelerators or frameworks such as ArmNN? https://www.tensorflow.org/lite/performance/post_training_quantization#full_integer_quantization

I always get confused at how to implement the representative_dataset()?

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8  # or tf.uint8
converter.inference_output_type = tf.int8  # or tf.uint8
tflite_quant_model = converter.convert()

Has anyone done this and got an example or even better the tflite models?

jeungmin717 commented 2 years ago

@StuartIanNaylor

In my case , Full-integer quantization for this double stacked LSTM model is not available. the calculation inside the model still remains float32, when I (dynamic) quantized this model

from tensorflow official documentation full-int quantization(static quantization) for LSTM not available.

check below issue also https://github.com/tensorflow/tensorflow/issues/25563

I think research on fully quantize LSTM model is still under construction
hope this can give you some help : )

StuartIanNaylor commented 2 years ago

That is a massive help and many thanks for the bad news as will save much wasted time.

Damn! :( There are a lot of frameworks such as ArmNN to Npus's that can not run it then unless cpu. I will leave it open so people can see your great info. Many Thanks

jeungmin717 commented 2 years ago

@StuartIanNaylor Glad you got my little help. It's too bad that It cannot be fully-quantized for ArmNN or microcontrollers But in my opinion, it already satifies realtime performance on CPU (worst case maybe ? ) Which makes no need for fully-quantized model, if your hardware has CPU. amazing acheivement breizhn has made.

StuartIanNaylor commented 2 years ago

Its no criticism of what breizhn produced just the realisation of even better and further optimisation whilst also dropping the python for a DSP more performant C/Rust environ, could achieve. There are so many devices now with Mali GPU's that with ArmNN quant could of run maybe and same of embedded NPU's. This lies with ML frameworks especially Tensorflow or maybe Onnx and why recurrent metworks such as LSTM or GRU is so problematic is out of the scope of my knowledge level but I can appreciate the limitations.

JorgeRuizDev commented 2 years ago

A few months ago I managed to quantize this LSTM model and run it on a Coral Edge TPU https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb

The example has been broken since TF 2.7...

StuartIanNaylor commented 2 years ago

A few months ago I managed to quantize this LSTM model and run it on a Coral Edge TPU https://colab.research.google.com/github/google-coral/tutorials/blob/master/train_lstm_timeseries_ptq_tf2.ipynb

The example has been broken since TF 2.7...

Yeah its confusing as post-training quantization of recurrent layers does seem to be broken, dunno.

WaterBoiledPizza commented 1 year ago

If I may ask, how do you plan to convert this model to integer only? 1) The mask produced by the model ranges from 0 to 1. Is it possible to train the integer-only model to produce mask ranges from 0 to 255 ? 2) If the states are changed to integer only, it would affect the LSTM's/RNN's performance. So how do you keep the difference minimal?

JorgeRuizDev commented 1 year ago

You map the 0 to -128 and the 1 to 127, and all the intermediate values are then quantized into that interval.

If the network is tightly fitted , quantization can destroy the network performance, and you need to use alternative methods that only a few experimental/research frameworks support...

In other cases, the network will just output a similar output with some extra error.

I think that Quantization Aware Training with RNN is still in an experimental phase, but you can check out QKeras, a QAT library that partially supports this type of training for RNN.

WaterBoiledPizza commented 1 year ago

I noticed that the value of states keep going up as the model is processing the audio, so how should I quantize it within the int8 limit?

nyadla-sys commented 1 year ago

@st4 use the attached dtln quantized tflite model https://github.com/nyadla-sys/whisper.tflite/blob/main/models/dtln_quantized.tflite

heisenberg-kim commented 12 months ago

https://github.com/heisenberg-kim/lstm_in_the_unet

quantization model from dtln