Closed lifelongeek closed 6 years ago
Hi,
Thank you for your message.
Let me start answering you by first commenting on the:
MAD-Twinnet assumes skip-filtering connection to produce masked output
Skip-filtering connections produce filtered output, i.e. the skip-filtering connections actualy apply a filter to a signal.
If I understand correct what you mean by:
Could you provide any relevant material that mask can be estimated by the function of mixture? (formula (8))
then the first work that actually did this is one of our previous work [1] (i.e. MaD v.0.1). After that, the work by [2] followed, then is our work in [3] (i.e. MaD v.1.0), and then is the MaD Twinnet [4]. If you (or anybody else) find more references, please reply to this thread :)
As for the last part:
Also, could skip-filtering connection can be valid for general speech enhancement which can include reverberation, channel distortion?
I do not know :) It might be good, it might not! Feel free to try it!
References [1] S.-I. Mimilakis, K. Drossos, G. Schuller, and T. Virtanen, “A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation,” in 27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan, Sep. 2017.
[2] A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, and T. Weyde, “Singing voice separation with deep U-Net convolutional networks,” in 18th International Society for Music Information Retrieval Conference, Suzhou, China, Oct. 2017.
[3] S.-I. Mimilakis, K. Drossos, J.-F. Santos, G. Schuller, T. Virtanen, and Y. Bengio, “Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, Apr. 2018.
[4] K. Drossos, S.-I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, and Y. Bengio, "MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation," in IEEE World Congress on Computational Intelligence/International Joint Conference on Neural Networks (WCCI/IJCNN), Rio de Janeiro, Brazil, Jul. 2018.
I am exciting to find this work. Thanks for sharing your paper and code with us :+1:
For designing masker and denoiser, MAD-Twinnet assumes skip-filtering connection to produce masked output.
Could you provide any relevant material that mask can be estimated by the function of mixture? (formula (8)) Also, could skip-filtering connection can be valid for general speech enhancement which can include reverberation, channel distortion?
I appreciate your comments.