breizhn / DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
MIT License
585 stars 161 forks source link

Question about real-time implementation using tensorflow lite #57

Closed seorim0 closed 2 years ago

seorim0 commented 2 years ago

First of all, thank you for providing such a wonderful project.

This is my first time using tensorflow lite, so there is something I don't understand. Is it okay to ask a few questions?

  1. When using the convolutional layer in tensorflow lite and setting the kernel size to 2 or more, what should I do with the code implementation? I mean, in the case of real-time speech enhancement, the input is not provided at once because it has to be processed frame-by-frame. How do I save and retrieve the previous value? like below code.

    # if we have a encoder composed with convolutional layer (kernel size is 2)
    out = current_input_frame
    for idx, encoder_layer in enumerate(encoder):
    out = encoder_layer(out)
    tmp = encoder_out[idx+1]
    encoder_out[idx+1] = out
    out = torch.cat([tmp, out])
  2. What should I do if I want to implement DTLN in a mobile environment?

StuartIanNaylor commented 2 years ago

I am a bit of a noob so maybe not the best answer for 2 but after concerning to tflite & quantisation probably the biggest problem for a mobile environ is Python as its really atrocious at iterating through chunked audio like that. Sanebow did a great job at trying to make Python performant with https://github.com/SaneBow/PiDTLN but with Python its about as good as you can get. https://github.com/avcodecs/DTLNtfliteC & https://github.com/Turing311/Realtime_AudioDenoise_EchoCancellation are probably far better a mobile environment but need converting from a file based audio interface to a streaming audio interface. You can do quantisation aware training but think due to the keras sub-classing its not possible so the post training quantisation is best you can do model wise, but could be wrong on that one.

seorim0 commented 2 years ago

Thanks for your kind reply! Can I get an answer to the first question as well?

StuartIanNaylor commented 2 years ago

I didn't really understand the question as DTLN is processed from memory by each blockshift which the default is 128 samples which at 16k sampling rate accounts for 8ms. So what previous value do you want to save? Prob give https://github.com/SaneBow/PiDTLN a go and follow the instructions there

seorim0 commented 2 years ago

It was not a question directly related to DTLN. It was a question about using convolutional layers with tensorflow lite, and I asked if you are aware of it.

StuartIanNaylor commented 2 years ago

No someone else may answer that