-
Hello! I've a question about EER results that you've got in research paper. My question addressed only for english version of CAM++. (Because with chinese version all results look right)
First of all…
-
Given previously recorded and recognized speaker embeddings used for diarization, it seems like it would be possible to match any new voice to a previously recorded database of known voices with assoc…
-
I have a WhisperX Python script for transcribing meetings, but the speaker diarization for German is really bad, unfortunately.
After some research I came across the fine-tuned German segmentation…
-
Hello,
I just trained on two speakers at the same time.
The filelist looks like this:
```
/home/ubuntu/RVC-beta-v2-0528/logs/merged/0_gt_wavs/0_4_48.wav|/home/ubuntu/RVC-beta-v2-0528/logs/merg…
Rolun updated
4 months ago
-
torchrun --nnodes=1 --nproc_per_node=8 --master_port=25001 \
llava/train/train_mem.py \
--model_name_or_path /path/to/checkpoint_llava_med \
--data_path /path/to/your_dental_dataset.jso…
-
Is there any way to implement [voicefixer](https://github.com/haoheliu/voicefixer_main) to speaker diarization pipeline?
The package takes a wav file as input and gives a upsampled 44100kHz wav file…
-
Hi,
I am trying to extract the audio features from the clips.
I've downloaded the clips and then I run run the code 'batch_audio_embedding.py'. (inside the folder audio-visual/active-speaker-detect…
-
Hi!
I've encountered a problem
I have multi speaker dataset.
If I train a separate model for speaker (single speaker model) - prosody, speed, intonations, timbre, identity are good (for the spe…
-
Hi,
I wanna use **wavlm** model to extract speaker embedding for speaker verification task. In [the paper](https://arxiv.org/pdf/2110.13900.pdf) it is mentioned that for the task of speaker verificat…
-
Adding here some implementation improvements that I need to do courtesy of comments from @r9y9
- [XX] Change F0 to log-F0 (and continuous)
- [] Use original speaker embedding during training,
- …