Closed seorim0 closed 2 years ago
I am a bit of a noob so maybe not the best answer for 2 but after concerning to tflite & quantisation probably the biggest problem for a mobile environ is Python as its really atrocious at iterating through chunked audio like that. Sanebow did a great job at trying to make Python performant with https://github.com/SaneBow/PiDTLN but with Python its about as good as you can get. https://github.com/avcodecs/DTLNtfliteC & https://github.com/Turing311/Realtime_AudioDenoise_EchoCancellation are probably far better a mobile environment but need converting from a file based audio interface to a streaming audio interface. You can do quantisation aware training but think due to the keras sub-classing its not possible so the post training quantisation is best you can do model wise, but could be wrong on that one.
Thanks for your kind reply! Can I get an answer to the first question as well?
I didn't really understand the question as DTLN is processed from memory by each blockshift which the default is 128 samples which at 16k sampling rate accounts for 8ms. So what previous value do you want to save? Prob give https://github.com/SaneBow/PiDTLN a go and follow the instructions there
It was not a question directly related to DTLN. It was a question about using convolutional layers with tensorflow lite, and I asked if you are aware of it.
No someone else may answer that
First of all, thank you for providing such a wonderful project.
This is my first time using tensorflow lite, so there is something I don't understand. Is it okay to ask a few questions?
When using the convolutional layer in tensorflow lite and setting the kernel size to 2 or more, what should I do with the code implementation? I mean, in the case of real-time speech enhancement, the input is not provided at once because it has to be processed frame-by-frame. How do I save and retrieve the previous value? like below code.
What should I do if I want to implement DTLN in a mobile environment?