DigitalPhonetics / IMS-Toucan

Multilingual and Controllable Text-to-Speech Toolkit of the Speech and Language Technologies Group at the University of Stuttgart.
Apache License 2.0
1.17k stars 135 forks source link

New ToucanTTS does not use GPU consistently during inference with BigVGan #133

Closed Ca-ressemble-a-du-fake closed 1 year ago

Ca-ressemble-a-du-fake commented 1 year ago

Hi,

First of all thanks for this new release! I am playing around with it and noticed it is much slower (with GPU) than the past version even when using Avocodo instead of BigVGan.

When running the Meta model, it spends ages (~ 1 minute) on the first sentence on a single CPU thread (model is loaded on VRAM but GPU isn't used), then uses GPU for 3 or 4 sentences, halts again for ~ 15 seconds, and then continues "normally" for the other sentences (30 sentences). Finally it spends time maybe writing the wav on disk.

Previously (version 2.4) it used to run smoothly on GPU. Now with 2.5 it is much slower (even with the "old" Avocodo) with CPU running at max speed on one thread.

Is it the normal new behavior or something is going wrong on my computer ?

=> It looks like this is due to using BigVGan but the "run, pause, run, pause, run" behavior is surprising.

Flux9665 commented 1 year ago

It might be because I compiled the vocoder using jit during inference. I removed the compilation step and hope that the weird behaviour is better now. If not, it has something to do with the BigVGAN generator, but in that case I don't really know what can be done about this.

Ca-ressemble-a-du-fake commented 1 year ago

I see your updates I am going to try the updated version. Thank you.

Ca-ressemble-a-du-fake commented 1 year ago

That's much faster. This is not pausing anymore! So problem solved!