Closed ctlaltdefeat closed 3 years ago
Hi, you're right, calling espeak-ng
as a subprocess is far from an ideal solution... But is the easy one.
The better way would be to use the espeak shared library by writing a C/Python wrapper. This would allow to load the espeak related code only once at import time in the phonemizer, instead of for each utterance as in the actual implementation...
Just as an update, calling espeak-ng in parallel has not turned out to be a good solution for me because it seems like the latency of each call is rather volatile for some reason that I do not understand (and thus the latency of the worst call is a lower bound), and in addition using multiprocessing introduces overhead. I'm now trying to work around this by creating persistent processes and communicating with them via stdin and stdout, but it doesn't seem like espeak plays that well with stdin and stdout so I'm not having much success.
See https://github.com/rhasspy/espeak-phonemizer for a wrap of libespeak-ng.so
using ctypes
.
See https://github.com/rhasspy/espeak-phonemizer for a wrap of
libespeak-ng.so
usingctypes
.
I have tried to run above espeak-phonemizer, but seems like I got an error:
Segmentation fault (core dumped)
I'm working to integrate ctypes for the espeak backend. Drastic speed improvements. Release in few days...
this is now implemented in master branch, feel free to try and let me know if you have any remarks
The method used to preserve punctuation for espeak-ng leads to performance that scales linearly with number of punctuation marks because each punctuation split leads to another call to espeak-ng. Unfortunately, espeak-ng has a rather significant overhead which causes this compounding issue to substantially affect downstream applications.
On my machine:
Given the difficulty in understanding the internals of espeak-ng, I suggest that an initial way to combat this is calling espeak-ng in parallel within
_phonemize_aux
and then merging (perhaps in a way that respectsnjobs
).