-
@yfchenmodelscope
Hello! I'd like to run inference with CAM++ to extract speaker embeddings for the LibriTTS dataset. I noticed that the code converts the sampling rate to 16kHz during the MFCC featu…
-
Hi @ylacombe! I have a multi-speaker data using which I have trained the hindi checkpoint. I wanted to generate a particular speaker's voice during inference. Is there any way to do that using the inf…
-
Hi, thanks for sharing the code.
I have a folder with wav files of different speakers. I don't understand what to do next to get the trained model. What type of files should be in the "mels" and "em…
-
### System Info
Transformers.js Alpha 10, Brave
### Environment/Platform
- [X] Website/web-app
- [ ] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Desktop app (e.g., Electron…
-
I am wondering about how you extract the speaker embedding with pre-trained verification model.
The speaker embedding I get from [https://github.com/resemble-ai/Resemblyzer](url) will have a vector…
-
I am getting the following error when using "openai/whisper-medium" model with timestamp prediction:
`There was an error while processing timestamps, we haven't found a timestamp as last token. Was W…
-
## Title
A friendly introduction to word embeddings
## Abstract
We will discuss the limitations of traditional textual data representation methods and explore how we can do better.
In the proces…
-
Hey @KoljaB , I have tried this tool and it is surprisingly really good. It outperformed pyannote for sure.
But I'm really wondering how it can be pushed for 10+ speakers or so. It would be really us…
-
with the same wavenet model and the same utterence(p225_001.wav), i found that the quality of the waveform generated from the mel-spectrogram in provided metadata.pkl is much better than the one gener…
nkcdy updated
4 years ago
-
Hi so first of all great work,
the diarization works great for me on audio files with less than 3 speakers. Given an audio file with more than or close to 8 speakers, results in a very good transcri…