Closed aaronhsueh0506 closed 2 years ago
I was under the impression that DepFilterNet can't be used for real-time inference i.e for streaming usecases (20ms chunk as input). I was surprised to see your comment Now I can do the real-time inference process with buffer size=1
. Was I under any misconception ?
Hi,
I think you can look paper of Deepfilternet2, and other streaming method "google-research" in Github. Now I can do frame-in-frame-out for DeepFilterNet, but if you want to deploy to embedded device, you need to quantize GRU by yourself (maybe use C/C++)
Thank you so much aaronhsueh, this is news to me ! My usecase is just deploying on a web-app server. Do I need to do any preprocessing such as overlapp and add of the frames before I input to the DeepFilternet or just chunking the full length audio signal to 20ms is sufficient ?
following
@aaronhsueh0506 can you share implementation of your realtime loop? Specifically fix of https://github.com/Rikorose/DeepFilterNet/issues/76
Hi,
I have tried to use small lr to solve typing noise. So close this issue.
Hi,
I think you can look paper of Deepfilternet2, and other streaming method "google-research" in Github. Now I can do frame-in-frame-out for DeepFilterNet, but if you want to deploy to embedded device, you need to quantize GRU by yourself (maybe use C/C++)
How did you manage to get rid of border effects in real-time mode? Did you modify convolutional layers?
Hi, I think you can look paper of Deepfilternet2, and other streaming method "google-research" in Github. Now I can do frame-in-frame-out for DeepFilterNet, but if you want to deploy to embedded device, you need to quantize GRU by yourself (maybe use C/C++)
How did you manage to get rid of border effects in real-time mode? Did you modify convolutional layers?
No, I did not modify CNN, because kernel is 3x3 for first layer and 1x3 for others in Deepfilternet2. Here you can queue a buffer with 3 frames, and you can get one time-stamp output mask and coefficients for DF.
@aaronhsueh0506 thanks for the reply! Can I ask you, what frame size did you use for the streaming implementation, and did you use overlapped frames?
@aaronhsueh0506 thanks for the reply! Can I ask you, what frame size did you use for the streaming implementation, and did you use overlapped frames?
20ms frame size and 50% overlap
Thank you!
No, I did not modify CNN, because kernel is 3x3 for first layer and 1x3 for others in Deepfilternet2. Here you can queue a buffer with 3 frames, and you can get one time-stamp output mask and coefficients for DF.
@aaronhsueh0506 Can you share the implementation? I wasn't able to get this working. Thanks in advance :)
Hi Rikorose,
Thanks for working on version 2 of Deepfilternet. Now I can do the real-time inference process with buffer size=1, which is the same as the full signal effect. The point is that the state of the RNN needs to be inherited.
Now I'm having a trouble with typing/keyboard noise not working well. But I only use spectral loss with c=0.3 in Deepfilternet2 now, will multi-resolution loss improve in this case? or maybe c=0.6 in preious work is better?
Thanks, Aaron