Getting the List of Voices for eSpeak Fails with Espeak-NG

emassey0135 commented 1 year ago

Expected Behavior

I tried to run the EPUB Enhancer script (epub3-to-epub3) on an EPUB with TTS enabled on Linux using the command-line interface.

Actual Behavior

The Espeak engine failed to activate because getting the list of voices failed, so all sentences failed to be spoken and the resulting EPUB had no audio.

Steps to Reproduce

Make sure eSpeak-NG is installed and can be found on the system path as espeak (create a symbolic link if necessary)
Make sure eSpeak has a higher priority than any other TTS
From the cli directory of the Pipeline, run "./dp2 epub3-to-epub3 --source {epub} --tts true -o {output directory} --include-tts-log true [--tts-config {TTs configuration file}]"

Details

The problem seems to be in the regular expression that extracts details about voices from the output of the espeak command for a specific language, specifically when it extracts the gender. The error is traced to line 133 of modules/tts/tts-adapters/tts-adapter-espeak/src/main/java/org/daisy/pipeline/tts/espeak/impl/ESpeakEngine.java. The part that causes the error is "mr.group("gender").trim().toLowerCase()". Above this line the locale is matched successfully, so I think there must be a slight difference between how the old eSpeak prints the gender in the voice table and how eSpeak-NG does. The regular expression string that mr is created using is "^\\s*[0-9]+\\s+(?[-a-z]+)\\s+(?[FfMm-]\\s+)?(?[^ ]+)". I tried changing the regular expression to make it match the output of Espeak-NG, but I could not fix the issue. The last one I tried is "^\\s*[0-9]+\\s+(?[-a-zA-Z0-9]+)\\s+-*\\/*(?[FfMm]\\s+)(?[^ ]+)".

Environment

Operating system: Arch Linux X86_64
DAISY Pipeline 2 version: master (compiled from source)
Interface: Command Line
ESpeak-NG Version: 1.51.1

Logs

bertfrees commented 1 year ago

Thanks for the report! I can not currently install espeak-ng on my (older) macOS, so would appreciate if someone could further diagnose the issue or assist me with it.

Could you perhaps send me the output of the --voices command, so that I can try to fix the regex that way?

emassey0135 commented 1 year ago

I included the output of "espeak --voices" in the Gist I linked to. Its in the file named espeak-ng-voices.txt, and it shows up under the big log file. Here is a direct link to the file.

bertfrees commented 1 year ago

Sorry, should have seen that file before.

I think this regex should work:

"^\\s*[0-9]+\\s+(?<locale>[-a-zA-Z0-9]+)\\s+((--/)?(?<gender>[FfMm-])\\s+)?(?<name>[^ ]+)"

Below, change mr.group("gender") into Optional.ofNullable(mr.group("gender")).orElse("m") to avoid the NullPointerException. The .trim() can also be dropped with the new regex.

The regex should probably be further improved, because I presume instead of the "--" an action age can be listed?

emassey0135 commented 5 months ago

I apologize for taking so long to respond to this; I got distracted with other things and forgot about figuring this out. I started working on this again a few weeks ago, and when I applied the changes you suggested and added java.util.Optional to the imports, Pipeline successfully enumerated the voices, but I got another error when it tried to actually synthesize the speech. However, after that I started having issues with my Linux install on my laptop, so I cannot access the logs right now. When I get a working Linux install again I will try to diagnose the issue further and send the logs.

bertfrees commented 5 months ago

OK thank you.

emassey0135 commented 4 months ago

The error I am getting is "timeout (0 seconds) fired while speaking with espeak". For some reason, DAISY Pipeline seems to be giving eSpeak 0 seconds to speak some sentences. In the TTS log, there are lines like these: <text id="id_225" timeout="0.0s" selected-voice="{engine: espeak, name: English_(America), locale: en-US, gender:male-adult}" actual-voice="{engine: espeak, name: us-mbrola-2, locale: en-US, gender:male-adult}" time-elapsed="0.0s">.

When the timeout is not 0, it is anywhere from 1 to 12 seconds from the lines I looked at. Also, I tried this with the original eSpeak as well as eSpeak NG, and got the same result, so there must be a pre-existing bug in the eSpeak TTS adapter unless I am configuring it wrong somehow. Changing org.daisy.pipeline.tts.threads.number to 1 does not help.

Where is this timeout calculated? Is audio encoding happening too fast for eSpeak to keep up, so the Pipeline doesn't give it any time to speak?

Here are my new log files

bertfrees commented 4 months ago

The calculation of the timeout takes into account the length of the sentence and the time the speech synthesizer has spent on previous sentences (and the number of words processed so far). It doesn't matter if it is faster or slower than the audio encoding.

I don't quite understand what is happening. Normally the timeout should start at a very safe number of 5 seconds, plus an additional second per word, and then gradually adapt to a more realistic number as more sentences are synthesized. This is confirmed in my tests. But your log file shows that the first sentence already gets a timeout of 0 seconds. Really strange.

Can you help me reproduce your issue?

bertfrees commented 2 months ago

Original issue (getting the list of voices) fixed in 801696a.

emassey0135 commented 1 month ago

I have done some more testing, and the behavior I have observed is very strange. When I use the latest eSpeak-NG release, a smaller Bookshare book converts with no errors using the dtbook-to-daisy3 script. However, when I convert a medium-sized book, the behavior depends on the number of threads used for speech synthesis. When org.daisy.pipeline.tts.threads.number is set to 1, the timeout for a lot of sentences is set to 0 causing the error I described above. When it is set to 2, that still happens often, but I also get java.io.EOFException errors on some sentences. The error is traced to "org.daisy.pipeline.tts.espeak.impl.ESpeakEngine.synthesize(ESpeakEngine.java:87)", which means the exception is thrown in the code that sends the sentence to eSpeak and receives the audio data from its standard output. I am not sure why this would throw an EOFException, unless the eSpeak-NG process terminates before the sentence has finished sending. I tried running an SSML fragment that triggered this error through eSpeak-NG manually, using the same options EspeakEngine.java does, and it did not crash. When I set it to 4 or 6, or leave it as default, I never get errors from the timeout being set to 0, but I still get these EOFException errors. The SSML fragments that cause the error are the same every time, no matter if speech synthesis is using 4 or 6 threads. My CPU has 6 cores and no hyperthreading.

When I compile eSpeak-NG from the latest Git code, a lot of sentences are assigned a timeout of 0, even using 6 TTS threads and on the smaller book. I wonder if a bug in eSpeak-NG can be causing this error as well.

emassey0135 commented 1 month ago

Here are my TTS log files. The book I used is A Preface to Paradise Lost by C. S. Lewis from Bookshare if you have access to that.

emassey0135 commented 1 month ago

I just did a few more tests on a small EPUB I created myself. When I run epub3-to-epub3 on it with TTS enabled, some sentences are assigned a timeout of 0 seconds and fail to be spoken. However, when I convert it to a dtbook with epub3-to-daisy3 and then run daisy3-to-daisy3 on that, there are no errors and speech is generated successfully. I created the EPUB by converting a document from classics.mit.edu to Markdown with Pandoc, making some changes to the formatting, and then converting that to EPUB with Pandoc. Here is the TTS log from the conversion with errors, the EPUB, and its Markdown source, so you can see if you can reproduce this.

bertfrees commented 2 days ago

Thank you for the files, I will see if I can reproduce the issue (first on macOS).

daisy / pipeline