KyleZhang1118 / Voice-Separation-and-Enhancement

A framework for quick testing and comparing multi-channel speech enhancement and separation methods, such as DSB, MVDR, LCMV, GEVD beamforming and ICA, FastICA, IVA, AuxIVA, OverIVA, ILRMA, FastMNMF.
145 stars 35 forks source link
multi-channel speech-enhancement speech-separation

Voice-Separation-and-Enhancement

Discription

This program consists of several popular methods and its variants for speech separation and enhancement. The purposes of this program are to realize, test and compare methods quickly. The default model of microphone array is 6+1(peripheral+central) circular array. Test data are generated by ISM method[1,2] based on TIMIT database. Voicebox toolbox is required.
All codes are written and updated in Matlab by Ke Zhang. If you find any bug or error, please contact me.(kylezhang1118@gmail.com)
The list of main methods:
Beamforming:

Blind source separation(BSS):

In general, methods in beamforming use the steering vector or other spatial information of sources to enhance the target speech, and BSS methods only use the number of sources except some cases for solving the permutation ambiguity.

Guides for users

  1. The main function is command.m in which you can set the number and angles of sound sources(0-45-315 degrees), and select the algorithms in the list you want to test(set the value behind the corresponding method to 1 for running, 0 for not). Simulation environment can be set in ISM_setup.m, such as T60 for reverberation(0, 0.3s, 0.6s, 0.9s support), configurations of microphone array and NoiseFlag for noise adding,etc.
  2. In Process.m, the steering vector of sound sources are calculated by the function 'Cal_transfer' and the length of window for fft is set. The plotting and generation of separated signals are controlled by 'sign_plot' and 'sign_write' in Process.m. The performance of methods are shown in the top right corner of the plotting figure where the column with 2 numbers consists of SIR improved and SIR output and the column with 3 numbers consists of SDR, SIR and SAR, respectively.
  3. If you want to test the method written by yourself, follow the steps:
    • step1: Add your method to the list in command.m
    • step2: Add the session of your method in Process.m
    • step3: Write the function of your method which often named as 'Process_name', the structure of which can refer to Process_DSB(a simplest method) which consists of the fundermental flow, such as the stft and the reconstruction of the separated speech from time-frequency domain to waveform in time domain. The output of the 'Process_name' should consist of 'Y' which is the separated speech, 'W' which is the filter or demixing matrix, and the structure variable 'SetupStruc' which consists of some essential parameters(noting the window of enframe should be reserved in structure variable named as 'SetupStruc.name' in 'Process_name').
  4. The room size used in ISM mehod is 6x4x3m. The microphone array is placed in the center of room and the distance between adjacent microphone in 6+1 microphone array is 4.35cm. The distance between the sound source and the center of microphone array is 1.5m. If you want to test other environments, you sould read and rewrite the readData.m to use the data generated by yourself. The calculation of steering vector in Cal_transefer.m should also be rewritten based on the configuration of the microphone array.
  5. CommandOnline.m and OnProcess.m are designed for online methods which process input signals by blocks, not whole. They are seldom used and not updated frequently.

Details of methods in realization

Beamforming

DSB(Delay and sum)

References

[1] Lehmann, E.A. Diffuse reverberation model for efficient image-source simulation of room impulse responses. 2009.
[2] Available online: http://www.eric-lehmann.com/
[3] Anatasios A. Capturing and reproducing spatial audio based on a circular microphone array. 2013.
[4] Henry, C. Robust Adaptive Beamforming. 1987.
[5] Ernst, W. Blind acoustic beamforming based on generalized eigenvalue decomposition. 2007.
[6] Takuya, H. Online MVDR beamformer based on complex Gaussian mixture model with spatial prior for noise robust ASR. 2017.
[7] Hiroshi, A. Robust and precise method for solving the permutation problem of frequency-domain blind source separation. 2004.
[8] E. Bingham. A fast fixed-point algorithm for indenpendent component analysis of complex valued signals. 2000.
[9] Taesu, K. Blind source separation exploiting higher-order frequency denpendencies. 2007.
[10] Nobutaka, O. Stable and fast update rules for independent vector analysis based on auxiliary function technique. 2011.
[11] Robin, S. Independent vector analysis with more microphones than sources. 2019.
[12] Daichi, K. Determined blind source separation with independent low-rank matrix analysis. 2018.
[13] Kouher, S. Fast multichannel nonnegative matrix factorization with directivity-aware jointly-diagonalizable spatial covariance matrices for blind source separation. 2020.

Last edited in 9/14/2021