-
I tried to output the same voice for consistency, so I set the do_sample=False. However, the output is basically noise. Here is my code:
prompt = "It took me quite a long time to develop a voice, a…
-
Hey @sanchit-gandhi, like the repo. Excited to see this being worked on. Here's a benchmark of WhisperSpeech. I used your sample script on the same exact text snippet and it finished processing in …
-
Hey Masaya, Ryuichi, Yuma, Takuya and Kentaro,
Congratulations on the release of the LibriTTS-P dataset! It's a very valuable resource for building more expressive text-to-speech models and we can'…
-
Hello,
400 Hz is fine for read speech, but not suitable for expressive / emotional TTS applications. For example,
[0001_001491](https://github.com/espnet/espnet/files/11870982/0001_001491.txt) (p…
-
Thank you for your great work! Can you provide a more detailed guideline to train a model?
-
First off -- AMAZING TTS!!!
I know I'm repeating several other issues that have been opened, but I've spent several days testing and code tweaking to try to resolve the issues I have found, and wan…
-
Regarding using only one audio sample, you can speak multiple languages using the tone of the audio sample. In fact, what you use is: seamless
You can refer to this: https://replicate.com/adirik/seam…
curui updated
8 months ago
-
Can you add some more source like wixiaworld, webnovel and some light novel source also.
It is great app just wondering if you can add TTS option also to listen it in audio, we can use system TTS con…
-
Anyone can help me with how to implement seamless expressive in real-time with no latency ? Also, Suggest me some code references to implement. I am also interested in learning how to make these type …
-
# NVIDIA NeMo (ByT5 G2P and G2P-Conformer):
> NVIDIA NeMo provides grapheme-to-phoneme models for various languages, including **German**.
> The ByT5 G2P model is based on a neural network and can…