breizhn / DTLN

Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
MIT License
567 stars 159 forks source link

The length of the audio after noise reduction is shortened #71

Open 1003657663 opened 1 year ago

1003657663 commented 1 year ago

Hello, I am looking for a fast noise reduction model, yours just meets my requirements, thank you for your efforts.

I'm using this model to do preprocessing for my speech recognition model. My processing process is to receive part of the audio through websocket, then denoise, then perform VAD, and then splicing the entire audio for speech recognition processing, so I perform denoising. Noisy audio is only part of an entire sentence.

My requirement is that multiple segments of audio after voice segment noise reduction can be spliced together perfectly, but after I use real_time_processing_tf_lite.py to process the sound, there will be blank parts in the spliced audio, causing the sound to freeze.

In the figure below, the upper part is the audio before processing, and the lower part is the audio after processing. It can be seen that the audio after processing is the same length as the audio before processing, but the part with waveform is shorter, so the two parts after processing Audio cannot be stitched together directly.

Can multiple segments of noise-reduced audio be spliced together perfectly? I'm new to coding and I'm not very familiar with it. Can you help me realize it?

image

WaterBoiledPizza commented 1 year ago

Assuming you haven't changed the parameters in the code, so the block length is still 512 and the block shift is still 128 (75% overlap):

The enhancement process does not start at the start of the input audio, but zeros at about the length of 3 block lengths = 384 samples, and the input audio will slowly shift in and out of the input buffer 128 samples at a time, hence the extra silence at the start: image

If you slice your audio without any overlapping, the reconstructed audio will have blank parts per sliced audio length. So try overlapping in your slicing, which can reduce the effect of the blank parts.

1003657663 commented 1 year ago

Here comes another problem I tried cutting a complete wav into multiple segments and overlapping them, compared with directly inputting the entire wav file If I execute real_time_processing_tf_lite.py every time Generate new input_details_1, input_details_2, output_details_1, output_details_2, then my output waveform is always different from the input complete wav. If I keep multiplexing input_details_1, input_details_2, output_details_1, output_details_2 as global variables, then if I capture the sound from the microphone and input continuously, after too much sample data, it seems that the final waveform will have a lot of noise I understand that this program real_time_processing_tf_lite.py is used to process a long wav file, if I want to read the audio stream from the microphone and continuously perform noise reduction, How should I modify it so that the noise reduction results of each short audio are consistent with the noise reduction results of the entire long audio?

zqlsnr commented 1 year ago

用Pyaudio或者sounddevice实现,每次还是喂给模型512,每次替换512里面的256即可。