breizhn / DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
MIT License
567 stars 160 forks source link

Would work with audio streaming? #8

Closed emepetres closed 4 years ago

emepetres commented 4 years ago

Hi, first of all great work!

I'm wandering if it would be possible to use this method on audio streaming, because of the block shift that it is used.

I'd be possible? If so, should I do some modifications like for example not using the block shift?

Thanks!

breizhn commented 4 years ago

Hi,

Thanks a lot!

Yes audio streaming would work, but you would introduce a processing delay of 32 ms. If you don't want to build a hearing aid, this shouldn't be a problem. For denoising of video calls the delay would be acceptable. Actually the goal this work was to be able to perform real time stream processing.

When you look at real_time_dtln_audio.pyin the callback, there is audio stream processing performed. One block of the size of the shift (128 samples) is coming from the soundcard is written to a buffer:

# write to buffer
in_buffer[:-block_shift] = in_buffer[block_shift:]
in_buffer[-block_shift:] = np.squeeze(indata)

indata is here the block from the sound card.

Ant at the end one block of the same size (128 samples) is written to the soundcard:

# output to soundcard
outdata[:] = np.expand_dims(out_buffer[:block_shift], axis=-1)

where outdata is the block to the soundcard. This procedure can be used for any kind of audio stream.

If you train a model without block shift, you loose some performance. You would still have the same processing delay if you block length is 32 ms.

Does this answer your question?

breizhn commented 4 years ago

@emepetres, If you don’t have any additional questions, I will close this issue?

emepetres commented 4 years ago

Ok I understand, thank you very much @breizhn for your quick and detailed explanation!