SepReformer for Speech Separation [NeurIPS 2024]

This is the official implementation of “Separate and Reconstruct: Asymmetric Encoder-Decoder for Speech Separation” accepted in NeurIPS 2024 Paper Link(Arxiv)

News

🔥 October, 2024: We have uploaded the pre-trained models of our SepReformer-B for WSJ0-2MIX in models/SepReformer_Base_WSJ0/log/scratch_weight folder! You can directly test the model using the inference command below.

🔥 September 2024, Paper accepted at NeurIPS 2024 🎉.

Todo

We are planning to release the other cases especially for partially or fully overlapped, noisy-reverberant mixture with 16k of sampling rates for practical application within this year.

Untitled

We propose SepReformer, a novel approach to speech separation using an asymmetric encoder-decoder network.

Demo Pages: Sample Results of speech separation by SepReformer

Requirement

python 3.10
torch 2.1.2
torchaudio 2.1.2
pyyaml 6.0.1
ptflops
mir_eval

Data Preparation

For training or evaluation, you need dataset and scp file
1. Prepare dataset for speech separation (eg. WSJ0-2mix)
2. create scp file using data/create_scp/*.py

Training

If you want to train the network, you can simply trying by
- set the scp file in ‘models/SepReformer_Base_WSJ0/configs.yaml’
- run training as
```
python run.py --model SepReformer_Base_WSJ0 --engine-mode train
```

Inference

Simply evaluating a model without saving output as audio files

python run.py --model SepReformer_Base_WSJ0 --engine-mode test

Evaluating with output wav files saved

python run.py --model SepReformer_Base_WSJ0 --engine-mode test_wav --out_wav_dir '/your/save/directoy[optional]'

Training Curve

For SepReformer-B with WSJ-2MIX, the training and validation curve is as follows: