-
As a beginner to this repo, I'd like to try out the examples on my own data and run on a gpu to make sure things are working.
I noticed most of the examples do not have `batch = batch.to(self.devic…
-
Hi, I am still new to Kaldi. I would like to perform diarization on some of the speech samples from my own dataset which do not have any speaker labels available, so I would have to listen and compare…
-
I have been using https://github.com/NVIDIA/NeMo/tree/main/tutorials/speaker_recognition.
There is a way we can get embeddings for speaker recognition. (https://github.com/NVIDIA/NeMo/blob/main/exa…
-
Our current **speaker encoder** is trained with only LibriTTS (100, 360) datasets. However, we can improve its performance using other available datasets (VoxCeleb, LibriTTS-500, Common Voice etc.). I…
-
Are pre-trained models available? Where can I find them?
Thanks!
Prashant
-
- Changing `--n_mels` from 40 to 64 leads to a small increase in performance.
- Using `--log_input` also leads to a small increase in performance.
- Combining two loss functions (e.g. `angleproto` a…
-
Hi, is it possible to extract what time (or where) the speech of each speaker start and end?
I want to extract speech of each speaker so it needs to know when the speech matched to the speakers and e…
-
I'd like to thank all the contributors for their efforts. We know that DDP should be faster than DP, but how many times faster is DDP than DP in speechbrain? I mean that I use 8 RTX 2080 Ti GPUs in a…
-
We should start creating example recipes for some data sets and tasks. I'll post an initial list here, and we can modify or extend it based on discussions. I'll sort it by the level of implementation …
-
In the tutorial, the AMI dataset is used to train speech activity and change detection. However, the voxceleb data set is used to train speaker embedding. Does the speaker embedding model necessarily …