Justification of skip-filtering connection for general speech enhancement

Hi,

Thank you for your message.

Let me start answering you by first commenting on the:

MAD-Twinnet assumes skip-filtering connection to produce masked output

Skip-filtering connections produce filtered output, i.e. the skip-filtering connections actualy apply a filter to a signal.

If I understand correct what you mean by:

Could you provide any relevant material that mask can be estimated by the function of mixture? (formula (8))

then the first work that actually did this is one of our previous work [1] (i.e. MaD v.0.1). After that, the work by [2] followed, then is our work in [3] (i.e. MaD v.1.0), and then is the MaD Twinnet [4]. If you (or anybody else) find more references, please reply to this thread :)

As for the last part:

Also, could skip-filtering connection can be valid for general speech enhancement which can include reverberation, channel distortion?

I do not know :) It might be good, it might not! Feel free to try it!

References [1] S.-I. Mimilakis, K. Drossos, G. Schuller, and T. Virtanen, “A recurrent encoder-decoder approach with skip-filtering connections for monaural singing voice separation,” in 27th IEEE International Workshop on Machine Learning for Signal Processing (MLSP), Tokyo, Japan, Sep. 2017.

[2] A. Jansson, E. Humphrey, N. Montecchio, R. Bittner, A. Kumar, and T. Weyde, “Singing voice separation with deep U-Net convolutional networks,” in 18th International Society for Music Information Retrieval Conference, Suzhou, China, Oct. 2017.

[3] S.-I. Mimilakis, K. Drossos, J.-F. Santos, G. Schuller, T. Virtanen, and Y. Bengio, “Monaural singing voice separation with skip-filtering connections and recurrent inference of time-frequency mask,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, Canada, Apr. 2018.

[4] K. Drossos, S.-I. Mimilakis, D. Serdyuk, G. Schuller, T. Virtanen, and Y. Bengio, "MaD TwinNet: Masker-Denoiser Architecture with Twin Networks for Monaural Sound Source Separation," in IEEE World Congress on Computational Intelligence/International Joint Conference on Neural Networks (WCCI/IJCNN), Rio de Janeiro, Brazil, Jul. 2018.

dr-costas / mad-twinnet

Justification of skip-filtering connection for general speech enhancement #2