Open snakers4 opened 3 years ago
Consider giving a go to Silero TTS models. These are published under an open license assuming non-commercial / personal usage. Please see our TTS models here - https://github.com/snakers4/silero-models#text-to-speech (corresponding article https://habr.com/ru/post/549482/).
What is most important our TTS models can run on one CPU thread / core decently and depend mostly only on PyTorch.
Just let me repost some of the benchmarks here:
RTF (Real Time Factor) - time the synthesis takes divided by audio duration;
RTS = 1 / RTF (Real Time Speed) - how much the synthesis is "faster" than realtime;
We benchmarked the models on two devices using Pytorch 1.8 utils:
CPU - Intel i7-6800K CPU @ 3.40GHz;
GPU - 1080 Ti;
When measuring CPU performance, we also limited the number of threads used;
For the 16KHz models we got the following metrics:
| BatchSize | Device | RTF | RTS | | --------- | ------------- | ----- | ----- | | 1 | CPU 1 thread | 0.7 | 1.4 | | 1 | CPU 2 threads | 0.4 | 2.3 | | 1 | CPU 4 threads | 0.3 | 3.1 | | 4 | CPU 1 thread | 0.5 | 2.0 | | 4 | CPU 2 threads | 0.3 | 3.2 | | 4 | CPU 4 threads | 0.2 | 4.9 | | --- | ----------- | --- | --- | | 1 | GPU | 0.06 | 16.9 | | 4 | GPU | 0.02 | 51.7 | | 8 | GPU | 0.01 | 79.4 | | 16 | GPU | 0.008 | 122.9 | | 32 | GPU | 0.006 | 161.2 | | --- | ----------- | --- | --- |
For the 8KHz models we got the following metrics:
| BatchSize | Device | RTF | RTS | | --------- | ------------- | ----- | ----- | | 1 | CPU 1 thread | 0.5 | 1.9 | | 1 | CPU 2 threads | 0.3 | 3.0 | | 1 | CPU 4 threads | 0.2 | 4.2 | | 4 | CPU 1 thread | 0.4 | 2.8 | | 4 | CPU 1 threads | 0.2 | 4.4 | | 4 | CPU 4 threads | 0.1 | 6.6 | | --- | ----------- | --- | --- | | 1 | GPU | 0.06 | 17.5 | | 4 | GPU | 0.02 | 55.0 | | 8 | GPU | 0.01 | 92.1 | | 16 | GPU | 0.007 | 147.7 | | 32 | GPU | 0.004 | 227.5 | | --- | ----------- | --- | --- |
Consider giving a go to Silero TTS models. These are published under an open license assuming non-commercial / personal usage. Please see our TTS models here - https://github.com/snakers4/silero-models#text-to-speech (corresponding article https://habr.com/ru/post/549482/).
What is most important our TTS models can run on one CPU thread / core decently and depend mostly only on PyTorch.
Just let me repost some of the benchmarks here:
RTF (Real Time Factor) - time the synthesis takes divided by audio duration;
RTS = 1 / RTF (Real Time Speed) - how much the synthesis is "faster" than realtime;
We benchmarked the models on two devices using Pytorch 1.8 utils:
CPU - Intel i7-6800K CPU @ 3.40GHz;
GPU - 1080 Ti;
When measuring CPU performance, we also limited the number of threads used;
For the 16KHz models we got the following metrics:
For the 8KHz models we got the following metrics: