-
Does anyone plan to use wavenet-based vocoder, insteading of griffin-lim algoritm, which greatly increase the audio quality in the single-speaker experiment of [Deep Voice 2](https://arxiv.org/abs/170…
-
I have trained the model on LJ Speech for around 670k steps. When using `python synthesize.py` command I have received this error.
```
Graph loaded
2018-04-25 19:19:45.131340: I tensorflow/core/…
-
Scenario: using the pretrained LJSpeech FastPitch checkpoint(fp32), generate audio with a long text, so that the duration is 20 seconds or more.
Problem: After ~15 seconds into the audio, it gets q…
-
Has anyone trained this with english datasets? Also, section 2.2 in the wiki is for generating text for the speech dataset, right?
-
I am trying to train the tacotrone 2 on the IEMOCAP dataset. In order to fully take advantage of all the speakers, i added a speaker reference encoder (concatenate with the original text encoder). How…
-
Thank you for your interesting and valuable research.
I'm having trouble running the following command in terminal:
`bash extract_fbank.sh --stage 0 --stop_stage 2 --nj 16`
The sampling rate of…
-
hey,finally,i found you ,you are so busy,please help me?
when i implement the s2ut for your video tutorial,when i get the train.txt file. i have encountered the following errorhey,
(test_fairseq) ro…
-
## 論文タイトル(原文まま)
Matcha-TTS: A fast TTS architecture with conditional flow matching
## 一言でいうと
Matcha-TTSは、条件付きフローマッチングを使用した高速で高品質なテキスト読み上げ(TTS)モデルです。
### 論文リンク
[https://arxiv.org/abs/2309.0319…
-
What kinds of algorithms have you used to segment such long audios? The forced aligner could have some limitation to segment a long audio at once.
-
Is there inference code? I could not find any. but I read through other issues and found this.
i'll write a inference script next so we can do some quick experiments.
_Originally p…