Closed saumyaborwankar closed 2 years ago
Are you planning to train a speaker model or VAD model or both for speaker diarization?
Ideally i want to train both.
For speaker embedding: Prepare your manifest using these steps: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/datasets.html#all-other-datasets
Then Prepare the hydra configuration file as detailed here: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_recognition/configs.html#dataset-configuration
You may refer to this section as well on how to use the training script with hydra configuration file.
For VAD: Prepare your manifest using these steps: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/datasets.html#speech-command-freesound-for-vad
Then Prepare the hydra configuration file as detailed here: https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speech_classification/configs.html
Then use this script to train a VAD model.
Inference for speaker diarization:
For speaker diarization inference using trained models refer to these steps.
Please help, I was able to find documentation regarding making my own dataset in nemo format but couldnt find any material to use that dataset to train a model or finetune a pretrained model.