MycroftAI / mimic2

Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
Apache License 2.0
580 stars 103 forks source link

Would it make sense to switch to WaveRNN #41

Closed williamluke4 closed 4 years ago

williamluke4 commented 5 years ago

Github Samples

el-tocino commented 5 years ago

Seems much slower than real time.
with a 1030:

Using device: cuda

Initialising WaveRNN Model...

Trainable Parameters: 4.234M

Loading Weights: "quick_start/voc_weights/latest_weights.pyt"

Initialising Tacotron Model...

Trainable Parameters: 11.088M

Loading Weights: "quick_start/tts_weights/latest_weights.pyt"

+---------+---------------+-----------------+----------------+-----------------+
| WaveRNN | Tacotron(r=2) | Generation Mode | Target Samples | Overlap Samples |
+---------+---------------+-----------------+----------------+-----------------+
|  797k   |     180k      |     Batched     |     11000      |       550       |
+---------+---------------+-----------------+----------------+-----------------+

| Generating 1/1
| ████████████████ 84000/84700 | Batch Size: 7 | Gen Rate: 5.6kHz | 

Done.

real    0m19.961s
user    0m17.973s
sys     0m3.026s

On a 1070:

test@acropolis:$  time python3 quick_start.py --input_text "the weather in austin is eighty seven degrees and partly cloudy."
Using device: cuda

Initialising WaveRNN Model...

Trainable Parameters: 4.234M

Loading Weights: "quick_start/voc_weights/latest_weights.pyt"

Initialising Tacotron Model...

Trainable Parameters: 11.088M

Loading Weights: "quick_start/tts_weights/latest_weights.pyt"

+---------+---------------+-----------------+----------------+-----------------+
| WaveRNN | Tacotron(r=2) | Generation Mode | Target Samples | Overlap Samples |
+---------+---------------+-----------------+----------------+-----------------+
|  797k   |     180k      |     Batched     |     11000      |       550       |
+---------+---------------+-----------------+----------------+-----------------+

| Generating 1/1
| ████████████████ 84000/84700 | Batch Size: 7 | Gen Rate: 10.4kHz | 

Done.

real    0m12.461s
user    0m11.764s
sys     0m1.062s

Have you tried out LPCnet?

williamluke4 commented 5 years ago

Not yet will check it out

williamluke4 commented 5 years ago

It looks very promising, do you have any idea how to implement it?

el-tocino commented 5 years ago

No, was hoping you might have. :)

williamluke4 commented 5 years ago

Bugger, If you want to have a go at it together let me know :)