JusperLee / Dual-Path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch
Apache License 2.0
416 stars 66 forks source link
deep-learning pytorch rnn-model speech-separation speech-separation-algorithm

Dual-path-RNN-Pytorch

Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation implemented by Pytorch

If you have any questions, you can ask them through the issue.

If you find this project helpful, you can give me a star generously.

Demo Pages: Results of pure speech separation model

Plan

Dataset

We used the WSJ0 dataset as our training, test, and validation sets. Below is the data download link and mixed audio code for WSJ0.

Training

Training for Conv-TasNet model

  1. First, you need to generate the scp file using the following command. The content of the scp file is "filename && path".

    python create_scp.py
  2. Then you can modify the training and model parameters through "config/Conv_Tasnet/train.yml".

    cd config/Conv-Tasnet
    vim train.yml
  3. Then use the following command in the root directory to train the model.

    python train_Tasnet.py --opt config/Conv_Tasnet/train.yml

    Training for Dual Path RNN model

  4. First, you need to generate the scp file using the following command. The content of the scp file is "filename && path".

    python create_scp.py
  5. Then you can modify the training and model parameters through "config/Dual_RNN/train.yml".

    cd config/Dual_RNN
    vim train.yml
  6. Then use the following command in the root directory to train the model.

    python train_rnn.py --opt config/Dual_RNN/train.yml

Inference

Conv-TasNet

You need to modify the default parameters in the test_tasnet.py file, including test files, test models, etc.

For multi-audio

python test_tasnet.py 

For single-audio

python test_tasnet_wav.py 

Dual-Path-RNN

You need to modify the default parameters in the test_dualrnn.py file, including test files, test models, etc.

For multi-audio

python test_dualrnn.py 

For single-audio

python test_dualrnn_wav.py 

Pretrain Model

Conv-TasNet

Conv-TasNet model

Dual-Path-RNN

Dual-Path-RNN model

Result

Conv-TasNet

Final Results: 15.8690 is 0.56 higher than 15.3 in the paper.

Dual-Path-RNN

Final Results: 18.98 is 0.1 higher than 18.8 in the paper.

Reference

  1. Luo Y, Chen Z, Yoshioka T. Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation[J]. arXiv preprint arXiv:1910.06379, 2019.
  2. Conv-TasNet code && Dual-RNN code