-
First of all, I would like to express my sincere gratitude to the authors. This is an excellent piece of work! I have used ConvNext_TTS, and its synthesis quality is impressive, with very fast inferen…
-
Recently, I have been conducting applied research on Target Speaker Extraction, but I have encountered many difficulties. I came across your paper titled 'Generative Speech Foundation Model Pretrainin…
-
Changing every setting (even os side) to italian is giving very poor voice recognition (is basically understanding nothing)
-
-
Perhaps by tweaking eSpeak's parameters, or use a different voice file or dictionary file, we can improve the speech quality. We should investigate this. Perhaps we can interest the eSpeak devs in thi…
-
It would be nice to use whisper instead of vosk for the speech recognition on the server part, as it current seems to outperform other models in terms of quality of speech recognition.
-
## Describe the bug
After following the installation instructions (plus replacing phonemizer with https://github.com/justinjohn0306/phonemizer to make it work on Win 11), and using the same examples …
-
Is there a way to finetune a single speaker with this repo, if so could you share the steps?
recent commit to the readme said this:
```
Code and checkpoints incoming...
```
If you have time…
-
Hi,
I synthesized converted speeches of this three models, VAE, CDVAE and CDVAE-CLS-GAN separately. The results of CDVAE-CLS-GAN model sound worst. Is it supposed to be like this? Or anything I m…
-
The following model is a great high quality model supporting:
* Text to Speech
* Speech to Text
* Speech to Speech
It also allows multilingual translation in all these modes.
Would it be poss…