PyTorch implementation of RPNSD. Our code is largely based on a Faster R-CNN implementation faster-rcnn.pytorch by jwyang.
git clone https://github.com/HuangZiliAndy/RPNSD.git
cd RPNSD
PATH
variable in path.sh
, the current default is ~/anaconda3/bin
.conda install pytorch==0.4.0 cuda91 torchvision pillow"<7" -c pytorch
pip install -r requirements.txt
cd tools
make KALDI=<path/to/a/compiled/kaldi/directory>
cmd.sh
# Select the backend used by run.sh from "local", "sge", "slurm", or "ssh"
cmd_backend='local'
The purpose of this step includes
./run_prepare_shared.sh
Training on the Mixer6 + SRE + SWBD dataset. Default setting uses single GPU and takes about 4 days.
./train.sh
Pretrained model is available at pretrain-model.
Adapt the model on in-domain data. Since we use 5 folds cross validation, each time we train on 400 utterances from CALLHOME dataset and test on 100.
./adapt.sh
Inference stage.
./inference.sh
One example from CALLHOME dataset. The first stream is the ground truth label, the second stream is the x-vector system, and the third stream is RPNSD.
@inproceedings{huang2020speaker,
Title={Speaker Diarization with Region Proposal Network},
Author={Huang, Zili and Watanabe, Shinji and Fujita, Yusuke and Garcia, Paola and Shao, Yiwen and Povey, Daniel and Khudanpur, Sanjeev},
Booktitle={Accepted to ICASSP 2020},
Year={2020}
}
@article{jjfaster2rcnn,
Author = {Jianwei Yang and Jiasen Lu and Dhruv Batra and Devi Parikh},
Title = {A Faster Pytorch Implementation of Faster R-CNN},
Journal = {https://github.com/jwyang/faster-rcnn.pytorch},
Year = {2017}
}