-
Darization runs very slowly, uses almost 12gb of memory, and is seemingly not happening on the GPU (GPUz and Window's task manager show conflicting info)
- Latest WhisperX repo
- pyannote.audio 3…
-
Hi, is there a way to utilize multiple reference audios to capture more characteristics?
I'm not to familiar how it works under the hood, but is some stacking or averaging possible to implement for…
-
Hi, I am wondering what the reasoning behind the evaluation implemented in evaluateFromListSave is - it seems to me this is loading in 2 audio files, running the audio feature extractor on them, and c…
-
@taylorlu, I would like to appreciate your effort for this repo! I have a small doubt though while trying the Speaker Diarization for .wav file with 2 speakers, I am getting output for 4 different spe…
-
(see syllabus for instructions).
-
Right now, I'm planning to initiate the response with a "vim pedal", aka a hotkey, because knowing when to respond is difficult. https://github.com/yacineMTB/talk/blob/master/index.ts#L108-L135
Whe…
-
**Describe the bug**
PR #5579 broke xvector-conditioned TTS model packaging. In stage 9 of `tts.sh`, `spk_xvector.ark` was replaced with `{spk_embed_tag}.ark`, which in my recipe resolves to `xvector…
-
Thank you so much for developing such a high-quality, sparse, and performant network, @jaywalnut310. I thought I'd share the results I have obtained so that others can see how promising your network i…
-
### Describe the bug
There is a lot of memory consumption while generating embeddings in encode_batch() function in the EncodeClassifier. How to reduce the memory consumption?
### Expected behav…
-
First, Thanks for the excellent work by CorentinJ!
I noticed that the speaker encoder used in this work is ge2e, performance of which is far fall behind the SOTA. So I replaced the ge2e encoder with …