-
I think a better way of extracting speak content might be to use WHAMR! to add artificial reverberation along with background noise to the training dataset in order for the WavLM to extract content, t…
-
Mean MCD, f0RMSE, f0CORR, DDUR, CER, WER, accept rate: 7.43 37.94 0.237 0.343 3.5 8.4 94.00
For this example, the accept rate is 94. What does it mean?
Also, in the WavLM paper, the authors…
-
When I run the code for ASV examples, I got error ('dict' object has no attribute 'to) because of inputs is dict of tensors.
Are there any alternatives?
https://github.com/sinhat98/adapter-wavl…
-
Hello, great job.
I konw the wavlm is the model of pre-training in English data.
I would to kown, for the Chinese data set, about the extraction of ssl feature, which pretrain model is recommended?
…
-
Hello, I just discovered onnx format and its advantages in speed.
Has anyone tried to export MeloTTS to onnx format?
-
Hi,
Do you think that this solution can be adapted easily to work on different languages than English?
-
**Describe your question**
I want to train a CTC/Attention based acoustic model using representations extracted from [conformer pretrained model](https://storage.googleapis.com/vakyansh-open-models…
-
This is a follow up to the previous discussion threads regarding stochastic duration predictor in https://github.com/p0p4k/vits2_pytorch/issues/11 and https://github.com/p0p4k/vits2_pytorch/issues/68#…
-
您好,请问该模型是否有预训练数据?是否支持中文?
如果我用自己的中文数据训练,是否支持(中文)?
-
Hi @OlaWod, i appreciate your work.
I am trying to fine tune the FreeVC model with my custom multilingual data (using an already trained speaker encoder model), and without SR augmentation. After …