-
When I do:
# inference with FreeVC
`CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc`
How do I ge…
-
Hi Wataru,
I'm now at the training stage and have a few questions to ask:
- I noticed that you set .repeat(2) in datamodule.py, similarly to the config.yaml, this could be controlled according t…
-
Hi,
I have a quick question. To simulate the same training and testing settings for the SUPERB benchmark should we just run the commands as provided in the README? Or changing the batch size or le…
-
Hi!
Thanks for the great repo!
I find that CCC-wav2vec 2.0 performs especially well on SUPERB SE task, surpassing WavLM Large with a large margin.
I am trying to reproduce it but I not yet succes…
-
**Describe the bug**
ASR model has been trained and packed along with other required materials (stage 14) to be used for inference in another cluster/system. During inference "RuntimeError: Error(s) …
-
Hi,
I wanna use **wavlm** model to extract speaker embedding for speaker verification task. In [the paper](https://arxiv.org/pdf/2110.13900.pdf) it is mentioned that for the task of speaker verificat…
-
关于DiffuseStyleGesture+提取出的特征的维度,提取音频特征的维度为什么要这样设置:40+64+2+2+1024+1
为什么MFCC是40,log-mel是64,韵律特征是4等等,这样设置有什么特别的用意吗,为什么要这样取特征的维度
-
如题
-
Hi
Is there any way to detect emotion, stress of the speaker within the whisper domain ?
Best
-
I found your paper here: https://arxiv.org/pdf/2304.13085
Thanks for your contribution!
Can you provide more detail description or sample source code for WavLM in your paper?
I have tried to…