Closed emassey0135 closed 2 weeks ago
@NPavie Any idea what could be wrong here?
@emassey0135 thanks for the detailed report.
@bert I'm not sure, might be the mutex lock we use to solve the issue with Window 11 multithread speech issue that is not waiting long enough.
I'll do some tests to reproduce the issue and investigate.
@NPavie Thanks.
Just some quick news on the subject :
We started fixing the pipeline internal adapter to SAPI and Onecore voices to fix issues with OneCore on windows 11 (that is crashing after recent windows 11 updates and changes in the windows runtime library used to connect to the Onecore API).
We started testing the use of natural voices exposed by the NaturalVoicesSAPIAdapter tool and some problems are encountered with those voices :
@NPavi Thanks for the update. At least the DAISY Pipeline can already use the same voices Edge uses through the Azure speech adapter, although with that you have to pay for the Azure Speech API. The Edge/Azure voices sound a little better to me, but the Narrator natural voices still have the advantage of being faster and being free no matter how much text you convert, and they are still much more natural sounding than probably any other SAPI5 voices, so I'm glad those will probably be able to work well.
@NPavie It seems that in the latest release of NaturalVoiceSAPIAdapter, the Edge online voices now support SSML marks. The Edge voices do not support them directly, but the adapter now simulates SSML marks with word boundary events as described in this issue.
@emassey0135 interesting! I'll take a look a the latest version of the tool, thanks !
News update on the issue :
I did some tests with this bypass and so far I did not encounter any issue during synthesis, but i'll round up more tests in production just in case.
Expected Behavior
I was trying to generate a DAISY3 audiobook with TTS using the Microsoft natural voices for Narrator. I used NaturalVoiceSAPIAdapter to make these voices available to SAPI5, then created a TTS configuration file that sets one of these voices as highest priority.
Actual Behavior
Some sentences failed to be spoken using the voice I chose, and were synthesized successfully using another of these Microsoft natural voices instead. Most sentences were spoken successfully using the correct voice, and it seems very random which sentences failed. The error always says: "Could not speak : speech mutex lock has timedout"
Steps to Reproduce
Details
These voices and this SAPI5 adapter seem reliable when I tested them in other applications. I used one for NVDA for a while with no issues, and they also work fine in Book Wizard Producer. I even converted the exact same book in Book Wizard Producer that I tried to convert with DAISY Pipeline, using the same voice, and there were no errors. I also tried changing to a different Microsoft natural voice and got the same error. In addition, I converted the same book to audio with the Microsoft David OneCore voice several times with no issues. Changing org.daisy.pipeline.tts.threads.number or org.daisy.pipeline.tts.encoding.speed did not seem to help. I tried to find other SAPI5 voices to test with, but Eloquence seems not to use SSML marks correctly making DAISY Pipeline refuse to use it, and the Windows version of eSpeak only has a 32-bit installer so Pipeline did not find the eSpeak SAPI5 voices.
Could this issue be caused by the Microsoft natural voices taking longer to speak than the DAISY Pipeline expects, since they definitely take longer than most other SAPI5 voices? Or perhaps the voices or this adapter crashes when multiple threads try to speak at the same time?
Environment
Logs
Job log file, TTS log, TTS config, and source DTBook