Closed zuowanbushiwo closed 2 years ago
Hi @zuowanbushiwo ,
This can be changed indeed. It was done like that because those are the default values for the DNS Challenge dataset that I used to train the model. In order to change this behavior you should need two things:
Change line 101
to match your desired length frames=sample_rate * desired_duration
and then tell the model to expect an audio of a different length. Both DTLN
and CRUSE
have a sample_duration
parameter to control this. If you change this value (either for a constant or dynamically to adjust to your desired audio length on the fly) you should not need to retrain the model.
Hi @eagomez2 Thanks a lot for the guidance, I now know how to modify it. By the way,Is it possible to add a chunk-by-chunk (chunk size equal hop_size) real-time inference feature? Thanks! best wishes
Hi @zuowanbushiwo ,
It is possible. You could do it for example using sounddevice
to receive the audio in real time frame by frame. The model as is cannot be plugin it directly so process in such way, but with the following changes you should be able to reuse the trained weights:
n-1
to time step n
hop_size
and an internal buffer of fft_size
in such a way that each time you can produce hop_size
valid output samples, while providing `fft_size`` samples for the inference. This will imply doing the overlap-add procedure manually before sending the audio to the output.The repo of the original DTLN
has this implemented for tensorflow
, you can check it out in more details here. A similar procedure can be done for CRUSE
, although the necessary FFT/iFFT config may have to be slightly different.
Hi @eagomez2 That's very kind of you ,your really did a great favor of me ! thanks!
Hi @eagomez2 thanks for your open source work, very helpful to me . When I use the following command to do inference on my own data, and the length of the data is not the same .
The result is that only the first 10s of the data will be processed.
When I try to modify this code, it always gives an error. https://github.com/eagomez2/upf-smc-speech-enhancement-thesis/blob/f03395fecef5e8834247499f4dc5820200d727f4/src/predict.py#L100-L101
Is there any way to fix it? need retrain?
Looking forward to your reply All the best