Closed Liu-tj closed 1 year ago
Hi, thanks for your interest in DeepFilterNet.
a. Yes it removes data at the end of the signal, depending on the lookahead, essentially rotating the input data. If you are using a lookahead of e.g. 2 frames, it will zero pad 2 frames on the right side and truncate 2 frames on the left side. So apart from the boarder, the whole signal will be delayed by the specified lookahead. b. The whole model is causal and will introduce an algorithmic delay of max(conv_lookahead, df_lookaead) frames.
@Rikorose Hello, I implemented a real one frame in and one frame out before(real time), but I set conv_lookahead/ dF_lookahead =0. Can you modify these two parameters to train a pre-trained model?
I might eventually, but will not give eta. If you need it earlier than later, please train a model yourself.
OK, thank you! Looking forward to your new pre-trained model.
@Rikorose Hello, I implemented a real one frame in and one frame out before(real time), but I set conv_lookahead/ dF_lookahead =0. Can you modify these two parameters to train a pre-trained model?
@andyweiqiu Is this a proprietary code if not, by any chance is it possible to publish your code for this ?
@Rikorose Hello, I implemented a real one frame in and one frame out before(real time), but I set conv_lookahead/ dF_lookahead =0. Can you modify these two parameters to train a pre-trained model?
@andyweiqiu Is this a proprietary code if not, by any chance is it possible to publish your code for this ?
Sorry, I trained the model by setting conv_lookahead/ dF_lookahead =0 and then implemented the prediction in pure c++ on iOS. You can implement a streaming CONV and GRU based on Python. The conv_lookahead/ dF_lookahead =0 is required for the model to inference in real time, otherwise it is difficult to implement.
Hi, thanks for your interest in DeepFilterNet.
a. Yes it removes data at the end of the signal, depending on the lookahead, essentially rotating the input data. If you are using a lookahead of e.g. 2 frames, it will zero pad 2 frames on the right side and truncate 2 frames on the left side. So apart from the boarder, the whole signal will be delayed by the specified lookahead. b. The whole model is causal and will introduce an algorithmic delay of max(conv_lookahead, df_lookaead) frames.
Hi, @Rikorose I think the algorithmic delay should be (conv_lookahead 2 frames+ df_lookaead 2 frames = 4) frames because df_op is applied in the result of the first stage, is it right? For the final output of a frequency bin in the current frame, the df_coefs (1x5) needs a lookahead with 2 frames, and the 1st stage enhanced spectrogram consists of the past two frames, the current frames, and the next two frames. NOTE THAT the last frames (the second frame after the current frame) need the information of its the next two frames to do a 3x3 convolution operation in the first convolution layer.
Hi ZengBinky, you are right, the model mistakenly needs two more frames of lookahead. I have some new models coming up though:
Hi ZengBinky, you are right, the model mistakenly needs two more frames of lookahead. I have some new models coming up though:
- A model without any conv_lookahead and df_lookahead resulting in a total algorithmic latency including STFT of 20 ms.
- A slightly modified architecture where DF is applied on the noisy spectrum and does not depend on stage 1.
I'm also curious if the df_coefs network does not depend on stage 1 as you said, how does the model select from the enhanced spectrum of stage 1 and stage 2 for the lower 96 frequency bins?
In this case, it does not use the enhanced spectrum from stage 1, but applies DF to the noisy spectrum.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Thanks for your awesome work! And I am confusing about the pad_feat/pad_spec and df_op function so I open this issue to check it out. First, I try to test your trained model, and the class DfNet() in deepfilternet2.py
for line 430-432,444-445
My question is, a. In nn.ConstantPad2/3d, -p.df_lookahead=2 means to remove the data, so is there 2 frames of information missing during training? b. self.df_op is Causal/ not-Causal model? For example, is the first frame data calculated using 0,0,0,0 and 3 frames?
Thanks!