Speaker Verification with NeMo

This is merely an experiment, the dataset an4 used for the experiment is not suitable for training a speaker verification model, it was only used for fast training.
Nemo's examples and tutorials do not provide explicit illustrations to speaker verification. This will be an experiment of Speaker verification with NeMo.

Data Preparation

This step has been done in a previous experiment, manifest files were directly moved to the data folder of this repo. The following describes how manifests were obtained.
Run download_and_convert_an4.py to download dataset an4, and convert .sph to .wav.
Generate manifest files:
find {data_dir}/an4/wav/an4_clstk -iname "*.wav" > data/an4/wav/an4_clstk/train_all.scp
preview
head -n 3 {data_dir}/an4/wav/an4_clstk/train_all.scp

Convert .scp to manifest, set the --split flag for splitting training and development set:
python {path-to/scp_to_manifest.py} --scp {paths-to/train_all.scp} --id -2 --out {path-to-opt/all_manifest.json} --split

scp_to_manifest.py src: https://github.com/NVIDIA/NeMo/blob/main/scripts/scp_to_manifest.py

Speaker Verification Model Training

configuration file src: https://github.com/NVIDIA/NeMo/blob/main/examples/speaker_recognition/conf/SpeakerNet_verification_3x2x512.yaml
Run scripts/train_spk_ver_model.py to train a speaker verification model.

Speaker Embeddings Extraction

First generate embeddings_manifest.json for test. For the purpose of this experiment I created this manifest manually.
Run get_embs.py.
As the purpose of this experiment is to verify my voice, the recordings were of me, the recordings are not uploaded in this repo. (Audios for test were recorded on Praat.)

Speaker Verification: cosine-similarity of embeddings

Calculate cosine-similarity of the two speaker embeddings to see the certainty of this model of two audios being from the same speaker.

A first experiment score: 0.9686643297704525

JINHXu / speaker-verification

readme

Speaker Verification with NeMo

Data Preparation

Speaker Verification Model Training

Speaker Embeddings Extraction

Speaker Verification: cosine-similarity of embeddings