Closed emepetres closed 4 years ago
Hi,
Thanks a lot!
Yes audio streaming would work, but you would introduce a processing delay of 32 ms. If you don't want to build a hearing aid, this shouldn't be a problem. For denoising of video calls the delay would be acceptable. Actually the goal this work was to be able to perform real time stream processing.
When you look at real_time_dtln_audio.py
in the callback, there is audio stream processing performed.
One block of the size of the shift (128 samples) is coming from the soundcard is written to a buffer:
# write to buffer
in_buffer[:-block_shift] = in_buffer[block_shift:]
in_buffer[-block_shift:] = np.squeeze(indata)
indata is here the block from the sound card.
Ant at the end one block of the same size (128 samples) is written to the soundcard:
# output to soundcard
outdata[:] = np.expand_dims(out_buffer[:block_shift], axis=-1)
where outdata is the block to the soundcard. This procedure can be used for any kind of audio stream.
If you train a model without block shift, you loose some performance. You would still have the same processing delay if you block length is 32 ms.
Does this answer your question?
@emepetres, If you don’t have any additional questions, I will close this issue?
Ok I understand, thank you very much @breizhn for your quick and detailed explanation!
Hi, first of all great work!
I'm wandering if it would be possible to use this method on audio streaming, because of the block shift that it is used.
I'd be possible? If so, should I do some modifications like for example not using the block shift?
Thanks!