TTS Adapter Onecore/SAPI : simplification, system voice priority and multithreading bypass

NPavie commented 9 months ago

This PR includes a simplification of the native code side, a system wise selected voices prioritization, and a bypass of multithreading for the speak methods of the connector : Multithreaded calls to Onecore/SAPI synthesizer triggers a FastFail error on the native side that we are not able to catch and handle.

The following changes have been made in the attempt to keep the multithreading active and are included in the PR :

Global variables are removed to concentrate declaration as much as possible within functions to try optimizing thread data isolation,
Java Voices are now directly returned by native code adapters instead of being reconstructed from various native method calls that were using global variables
Speak methods are now directly returning a new NativeSynthesisResult java object holding both the audio data bytes buffer and the marks names and positions

Finally, until an investigation on how to re-enable multithreading capabilities is done (possibly with coroutines in winrt), the Onecore speak method calls are now set as synchronized on Windows 11 to avoid concurrent calls to OneCore API that are triggering the fastfail exception.

NPavie commented 9 months ago

Branch is now rebased on latest version of master. I also noticed and fixed a small bug in my detection of voice age on the native side.

@bertfrees I don't recall : do you prefer a squashed commit for the integration in master ?

bertfrees commented 9 months ago

do you prefer a squashed commit for the integration in master?

I prefer one commit per distinct/meaningful change. Sometimes that requires squashing commits, sometimes it requires splitting up commits :-).

bertfrees commented 9 months ago

@NPavie I cleaned up the branch a bit:

rebased
squashed two commits
updated commit messages
made every commit compile

My only remaining question is about the second commit: what is the effect/goal of putting the default voice first in the list?

NPavie commented 9 months ago

@NPavie I cleaned up the branch a bit:

rebased

squashed two commits

updated commit messages

made every commit compile

My only remaining question is about the second commit: what is the effect/goal of putting the default voice first in the list?

ok good !

In a discussion a while ago i think you mentionned the order if insertion of voices in the list could influence its priority. For people without configuration, the idea is to first use the system-wide selected voice if possible.

But it's a bit complicated to manage the adapter main voice here as there are two settings competing with each other on windows : one for the default SAPI voice and one for the default Onecore voice. For now i prioritize SAPI as some users might have third party voices installed for it (like acapela voices for windows).

bertfrees commented 9 months ago

In a discussion a while ago i think you mentionned the order if insertion of voices in the list could influence its priority.

Hmm, I don't know if that's the case. (And don't recall mentioning that.) It could be the case, but then it is probably an implementation detail. We would need to document the feature and add some unit tests for it.

For now i prioritize SAPI as some users might have third party voices installed for it (like acapela voices for windows).

Do you mention that somewhere in the code?

NPavie commented 9 months ago

Do you mention that somewhere in the code?

No its not mentionned, but the SAPI Voices are loaded first so i assume they are prioritized.

Given the problem we are encountering with onecore multithreading, i'm wondering if we should not keep all voices available (included the SAPI versions of the one that are also declared in OneCore, we are currently discarding the sapi version of a voice if a onecore version exists). @bertfrees what do you think ?

bertfrees commented 9 months ago

I mean the fact that you prioritize SAPI (load it first) because users might have third party voices installed for it: It would be good to mention that so that we remember why it is done.

Regarding keeping all voices: what would that mean in practice? The reason you removed the duplicates was to hide the implementation details from the user. If we keep the duplicates, the user will have the power to force SAPI or OneCore, but at the cost of more complexity. You can argue that Pipeline should automatically choose the underlying implementation that works best.

NPavie commented 9 months ago

I added a comment in the code to explain a bit why sapi voices are prioritized. (Also removed some warnings in the native code)

If nothing else, i think we can merge

bertfrees commented 9 months ago

Thanks for the comment!

Before we can merge, we still need to document and test that the order of the getAvailableVoices list has an influence on voice selection. I will take this task upon myself.

NPavie commented 8 months ago

@bertfrees I added some more changes to the sapi and onecore adapter to improve performances given the feedback i had from the testers. Also, I discovered by extending the tests that the SAPI connector was also impacted by the fastfail crash, but it happens less often.

I looked a bit at how NVDA was managing their calls to the OneCore API on the native/C++ side and they use a mutex to ensure the speech synthesis calls to the API are done serialy. By re-enabling multithreading and solely locking with mutex the problematic speech synthesis calls to both API, i was able to mitigate the performance impact, reducing the generation time in my test by a factor of around 3 while still consistently avoiding the fastfail crash. The synthesis through the SAPI connector remains quicker than the onecore version (by a factor of around 1.67) but it's possibly due to the fact that the SAPI connector deals with the underlying memory buffer outside the lock phase around the speak call, while onecore manage it internally in the locked phase.