Xflick / EEND_PyTorch

A PyTorch implementation of End-to-End Neural Diarization
MIT License
98 stars 16 forks source link

EEND_PyTorch

A PyTorch implementation of End-to-End Neural Diarization.

This repo is largely based on the original chainer implementation EEND by Hitachi Ltd., who holds the copyright.

This repo only includes the training/inferring part. If you are looking for data preparation, please refer to the original authors' repo.

Note

Only Transformer model with PIT loss is implemented here. And I can only assure the main pipeline is correct. Some side stuffs (such as save_attn_weight, BLSTM model, deep clustering loss, etc.) are either not implemented correctly or not implemented.

Actually the orignal chainer code reserves the pytorch interface, I may consider make a merge request after the code is well-polished.

Run

  1. Prepare your kaldi-style data and modify run.sh according to your own directories.
  2. Check configuration file. The default conf/large/train.yaml configuration uses a 4 layer Transformer with 100k warmsteps, which is different from their paper in ASRU2019. This configuration comes from their paper submitted to TASLP. As larger model yeilds better performance.
  3. ./run.sh

Pretrained Models

Pretrained models are offerred here.

model_simu.th is trained on simulation data (beta=2), and model_callhome.th is adapted on callhome data. They are all 4-layer Transformer models trained with conf/large/train.yaml.

Results

We miss the SwitchBoard Phase 1 for training data, so the results can be a little worse. Type Transformer Layer Noam Warmup Steps DER on simu DER on callhome
Chainer (ASRU2019) 2 25k 7.36 12.50
Chainer (TASLP) 4 100k 4.56 9.54
Chainer (run on our data) 2 25k 9.78 14.85
PyTorch (epoch 50 on simu) 2 25k 10.14 15.72
PyTorch 4 100k 6.76 11.21
PyTorch* 4 100k - 9.35

(* run on full training data, credit to my great colleague!)

Citation

Cite their great papers!

@inproceedings={fujita2019endtoend2,
    title={End-to-End Neural Speaker Diarization with Permutation-Free Objectives},
    author={Fujita, Yusuke and Kanda, Naoyuki and Horiguchi, Shota and Nagamatsu, Kenji and Watanabe, Shinji},
    booktitle={INTERSPEECH},
    year={2019},
    pages={4300--4304},
}
@inproceedings={fujita2019endtoend,
    title={End-to-End Neural Speaker Diarization with Self-Attention},
    author={Fujita, Yusuke and Kanda, Naoyuki and Horiguchi, Shota and Xue, Yawen and Nagamatsu, Kenji and Watanabe, Shinji},
    booktitle={IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
    pages={296--303},
    year={2019},
}
@article={fujita2020endtoend,
    title={End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification},
    author={Fujita, Yusuke and Watanabe, Shinji and Horiguchi, Shota and Xue, Yawen and Nagamatsu, Kenji},
    journal={arXiv:2003.02966},
    year={2020},
}