Closed NPavie closed 8 months ago
Branch is now rebased on latest version of master. I also noticed and fixed a small bug in my detection of voice age on the native side.
@bertfrees I don't recall : do you prefer a squashed commit for the integration in master ?
do you prefer a squashed commit for the integration in master?
I prefer one commit per distinct/meaningful change. Sometimes that requires squashing commits, sometimes it requires splitting up commits :-).
@NPavie I cleaned up the branch a bit:
My only remaining question is about the second commit: what is the effect/goal of putting the default voice first in the list?
@NPavie I cleaned up the branch a bit:
- rebased
- squashed two commits
- updated commit messages
- made every commit compile
My only remaining question is about the second commit: what is the effect/goal of putting the default voice first in the list?
ok good !
In a discussion a while ago i think you mentionned the order if insertion of voices in the list could influence its priority. For people without configuration, the idea is to first use the system-wide selected voice if possible.
But it's a bit complicated to manage the adapter main voice here as there are two settings competing with each other on windows : one for the default SAPI voice and one for the default Onecore voice. For now i prioritize SAPI as some users might have third party voices installed for it (like acapela voices for windows).
In a discussion a while ago i think you mentionned the order if insertion of voices in the list could influence its priority.
Hmm, I don't know if that's the case. (And don't recall mentioning that.) It could be the case, but then it is probably an implementation detail. We would need to document the feature and add some unit tests for it.
For now i prioritize SAPI as some users might have third party voices installed for it (like acapela voices for windows).
Do you mention that somewhere in the code?
Do you mention that somewhere in the code?
No its not mentionned, but the SAPI Voices are loaded first so i assume they are prioritized.
Given the problem we are encountering with onecore multithreading, i'm wondering if we should not keep all voices available (included the SAPI versions of the one that are also declared in OneCore, we are currently discarding the sapi version of a voice if a onecore version exists). @bertfrees what do you think ?
I mean the fact that you prioritize SAPI (load it first) because users might have third party voices installed for it: It would be good to mention that so that we remember why it is done.
Regarding keeping all voices: what would that mean in practice? The reason you removed the duplicates was to hide the implementation details from the user. If we keep the duplicates, the user will have the power to force SAPI or OneCore, but at the cost of more complexity. You can argue that Pipeline should automatically choose the underlying implementation that works best.
I added a comment in the code to explain a bit why sapi voices are prioritized. (Also removed some warnings in the native code)
If nothing else, i think we can merge
Thanks for the comment!
Before we can merge, we still need to document and test that the order of the getAvailableVoices
list has an influence on voice selection. I will take this task upon myself.
@bertfrees I added some more changes to the sapi and onecore adapter to improve performances given the feedback i had from the testers. Also, I discovered by extending the tests that the SAPI connector was also impacted by the fastfail crash, but it happens less often.
I looked a bit at how NVDA was managing their calls to the OneCore API on the native/C++ side and they use a mutex to ensure the speech synthesis calls to the API are done serialy. By re-enabling multithreading and solely locking with mutex the problematic speech synthesis calls to both API, i was able to mitigate the performance impact, reducing the generation time in my test by a factor of around 3 while still consistently avoiding the fastfail crash. The synthesis through the SAPI connector remains quicker than the onecore version (by a factor of around 1.67) but it's possibly due to the fact that the SAPI connector deals with the underlying memory buffer outside the lock phase around the speak call, while onecore manage it internally in the locked phase.
Very nice!
Before we can merge, we still need to document and test that the order of the getAvailableVoices list has an influence on voice selection.
@NPavie There is a problem. I tested this and it appears the order has no influence on voice selection. So your "System wide selected voices prioritization" commit does not have the desired effect. I will see what I can do.
OK, fixed it. See my latest commit.
Last minor change: I expanded the locking timeout in native code after a test in production of the adapter. I stumbled upon a case where the release of the mutex lock took one or two seconds more than the 3 seconds i defined in the code. (The conversion did not crash, but one sentence was synthesize with a fallback onecore voice immediately after the error arise in the log.)
Just in case, i extended the timeout test to 10 seconds.
I did more tests and it did not happended again so far after this extension of the timeout.
OK good! Let me know if it's ready. I will merge soon.
I think it is finally ready, I don't have crahs anymore on windows 11, performances are a bit lower than before but the impact is contained as much as possible, so I would say go for it :)
This PR includes a simplification of the native code side, a system wise selected voices prioritization, and a bypass of multithreading for the speak methods of the connector : Multithreaded calls to Onecore/SAPI synthesizer triggers a FastFail error on the native side that we are not able to catch and handle.
The following changes have been made in the attempt to keep the multithreading active and are included in the PR :
Speak methods are now directly returning a new NativeSynthesisResult java object holding both the audio data bytes buffer and the marks names and positions
Finally, until an investigation on how to re-enable multithreading capabilities is done (possibly with coroutines in winrt), the Onecore speak method calls are now set as synchronized on Windows 11 to avoid concurrent calls to OneCore API that are triggering the fastfail exception.