Closed widdiot closed 4 years ago
Here are my notes: Since I have only one speaker per utterence, Use pretrained sad model to generate segments file. then make rttm file. Get utt2dur for the utterences not the segments. and spk2idx file using scripts/create_spk2idx.py
Then proceed with stage 1
Specifically what files should I have and what should be their format, so I can run prepare_callhome_5folds.sh from stage 1 on my own dataset?