fakufaku / torchiva

Blind source separation with independent vector analysis family of algorithm in torch
https://torchiva.readthedocs.io/en/latest/
MIT License
88 stars 5 forks source link

TORCHIVA

A package for blind source separation and beamforming in pytorch <https://pytorch.org>_ .

Quick Start

The package can be installed via pip::

pip install torchiva

Separation using Pre-trained Model

We provide a pre-trained model in trained_models/tiss. You can easily try separation with the pre-trained model::

# Separation
python -m torchiva.separate INPUT OUTPUT

where INPUT is either a multichannel wav file or a folder containing multichannel wav files. If a folder, then all the files inside are separted. The output is saved to OUTPUT. The model stored in trained_models/tiss is automatically downloaded to $HOME/.torchiva_models. The path or url to the model can also be manually provided via the --model option. The model was trained on the WSJ1-mix dataset <https://github.com/fakufaku/create_wsj1_2345_db>_ with the same configuration as ./examples/configs/tiss.json.

Training

We provide some simple training scripts. We support training of T-ISS, MWF, MVDR, GEV::

cd examples

# install some modules necessary for training
pip install -r requirements.txt

# training
python train.py PATH_TO_CONFIG PATH_TO_DATASET

Note that our example scripts assumes using WSJ1-mix dataset. If you want to use other datasets, please change the script in the part that loads audios.

Test your trained model with checkpoint from epoch 128::

# python ./test.py --dataset ../wsj1_6ch --n_fft 2048 --hop 512 --n_iter 40 --iss-hparams checkpoints/tiss_delay1tap5_2ch/lightning_logs/version_0/hparams.yaml --epoch 128 --test

Export the trained model for later use::

python ./export_model.py ../trained_models/tiss checkpoints/tiss_delay1tap5_2ch/lightning_logs/version_0 128 146 148 138 122 116 112 108 104 97

Run the example script using the exported model::

python ./example_dnn.py ../wsj1_6ch ../trained_models/tiss -m 2 -r 100

Authors

License

2022 (c) Robin Scheibler, Kohei Saijo, LINE Corporation.

All of this code is released under MIT License <https://opensource.org/licenses/MIT>_