daisy / pipeline

Super-project that aggregates all Pipeline related code, provides a common tracker for Pipeline related issues and holds the Pipeline website
http://daisy.github.io/pipeline
20 stars 20 forks source link

Speech Synthesis with the SAPI5 Adapter Sometimes Fails When Using the Microsoft Natural Voices for Narrator #784

Open emassey0135 opened 1 week ago

emassey0135 commented 1 week ago

Expected Behavior

I was trying to generate a DAISY3 audiobook with TTS using the Microsoft natural voices for Narrator. I used NaturalVoiceSAPIAdapter to make these voices available to SAPI5, then created a TTS configuration file that sets one of these voices as highest priority.

Actual Behavior

Some sentences failed to be spoken using the voice I chose, and were synthesized successfully using another of these Microsoft natural voices instead. Most sentences were spoken successfully using the correct voice, and it seems very random which sentences failed. The error always says: "Could not speak : speech mutex lock has timedout"

Steps to Reproduce

  1. Install one or more natural voices for Narrator using these instructions.
  2. Download the zip archive from the latest release of NaturalVoiceSAPIAdapter.
  3. Unpack the zip archive into a folder. If you move this folder after installing the SAPI5 voices, you will need to uninstall them and install them again.
  4. Run Installer.exe.
  5. Make sure "Include Narrator natural voices" is checked, and uncheck "Include Microsoft Edge natural voices", since these voices do not support SSML marks and make the SAPI5 tests in the DAISY Pipeline fail.
  6. Press "Install" for both 32-bit and 64-bit, and press "Yes" on the UAC prompts that come up.
  7. Restart the DAISY Pipeline engine if it is already started.
  8. Either create a TTS configuration file specifying one of the Microsoft natural voices and assigning it priority 1 like the one I attached, or from the Pipeline UI, open "Settings" from the "File" menu, go to the "Voices" tab, and check the box for the Microsoft natural voice you want to use and press "Close".
  9. Run the script "dtbook-to-daisy3" with TTS enabled, as well as include TTS log. If you are using the CLI, specify the path of the TTS configuration file you created with the "--tts-config" option. If you are using the GUI, this is handled for you if you selected a voice in the settings.

Details

These voices and this SAPI5 adapter seem reliable when I tested them in other applications. I used one for NVDA for a while with no issues, and they also work fine in Book Wizard Producer. I even converted the exact same book in Book Wizard Producer that I tried to convert with DAISY Pipeline, using the same voice, and there were no errors. I also tried changing to a different Microsoft natural voice and got the same error. In addition, I converted the same book to audio with the Microsoft David OneCore voice several times with no issues. Changing org.daisy.pipeline.tts.threads.number or org.daisy.pipeline.tts.encoding.speed did not seem to help. I tried to find other SAPI5 voices to test with, but Eloquence seems not to use SSML marks correctly making DAISY Pipeline refuse to use it, and the Windows version of eSpeak only has a 32-bit installer so Pipeline did not find the eSpeak SAPI5 voices.

Could this issue be caused by the Microsoft natural voices taking longer to speak than the DAISY Pipeline expects, since they definitely take longer than most other SAPI5 voices? Or perhaps the voices or this adapter crashes when multiple threads try to speak at the same time?

Environment

Logs

Job log file, TTS log, TTS config, and source DTBook