daisy / pipeline

Super-project that aggregates all Pipeline related code, provides a common tracker for Pipeline related issues and holds the Pipeline website
http://daisy.github.io/pipeline
20 stars 20 forks source link

TTS Adapter Onecore/SAPI : simplification, system voice priority and multithreading bypass #718

Closed NPavie closed 8 months ago

NPavie commented 9 months ago

This PR includes a simplification of the native code side, a system wise selected voices prioritization, and a bypass of multithreading for the speak methods of the connector : Multithreaded calls to Onecore/SAPI synthesizer triggers a FastFail error on the native side that we are not able to catch and handle.

The following changes have been made in the attempt to keep the multithreading active and are included in the PR :

NPavie commented 9 months ago

Branch is now rebased on latest version of master. I also noticed and fixed a small bug in my detection of voice age on the native side.

@bertfrees I don't recall : do you prefer a squashed commit for the integration in master ?

bertfrees commented 9 months ago

do you prefer a squashed commit for the integration in master?

I prefer one commit per distinct/meaningful change. Sometimes that requires squashing commits, sometimes it requires splitting up commits :-).

bertfrees commented 9 months ago

@NPavie I cleaned up the branch a bit:

My only remaining question is about the second commit: what is the effect/goal of putting the default voice first in the list?

NPavie commented 9 months ago

@NPavie I cleaned up the branch a bit:

  • rebased
  • squashed two commits
  • updated commit messages
  • made every commit compile

My only remaining question is about the second commit: what is the effect/goal of putting the default voice first in the list?

ok good !

In a discussion a while ago i think you mentionned the order if insertion of voices in the list could influence its priority. For people without configuration, the idea is to first use the system-wide selected voice if possible.

But it's a bit complicated to manage the adapter main voice here as there are two settings competing with each other on windows : one for the default SAPI voice and one for the default Onecore voice. For now i prioritize SAPI as some users might have third party voices installed for it (like acapela voices for windows).

bertfrees commented 9 months ago

In a discussion a while ago i think you mentionned the order if insertion of voices in the list could influence its priority.

Hmm, I don't know if that's the case. (And don't recall mentioning that.) It could be the case, but then it is probably an implementation detail. We would need to document the feature and add some unit tests for it.

For now i prioritize SAPI as some users might have third party voices installed for it (like acapela voices for windows).

Do you mention that somewhere in the code?

NPavie commented 9 months ago

Do you mention that somewhere in the code?

No its not mentionned, but the SAPI Voices are loaded first so i assume they are prioritized.

Given the problem we are encountering with onecore multithreading, i'm wondering if we should not keep all voices available (included the SAPI versions of the one that are also declared in OneCore, we are currently discarding the sapi version of a voice if a onecore version exists). @bertfrees what do you think ?

bertfrees commented 9 months ago

I mean the fact that you prioritize SAPI (load it first) because users might have third party voices installed for it: It would be good to mention that so that we remember why it is done.

Regarding keeping all voices: what would that mean in practice? The reason you removed the duplicates was to hide the implementation details from the user. If we keep the duplicates, the user will have the power to force SAPI or OneCore, but at the cost of more complexity. You can argue that Pipeline should automatically choose the underlying implementation that works best.

NPavie commented 9 months ago

I added a comment in the code to explain a bit why sapi voices are prioritized. (Also removed some warnings in the native code)

If nothing else, i think we can merge

bertfrees commented 9 months ago

Thanks for the comment!

Before we can merge, we still need to document and test that the order of the getAvailableVoices list has an influence on voice selection. I will take this task upon myself.

NPavie commented 8 months ago

@bertfrees I added some more changes to the sapi and onecore adapter to improve performances given the feedback i had from the testers. Also, I discovered by extending the tests that the SAPI connector was also impacted by the fastfail crash, but it happens less often.

I looked a bit at how NVDA was managing their calls to the OneCore API on the native/C++ side and they use a mutex to ensure the speech synthesis calls to the API are done serialy. By re-enabling multithreading and solely locking with mutex the problematic speech synthesis calls to both API, i was able to mitigate the performance impact, reducing the generation time in my test by a factor of around 3 while still consistently avoiding the fastfail crash. The synthesis through the SAPI connector remains quicker than the onecore version (by a factor of around 1.67) but it's possibly due to the fact that the SAPI connector deals with the underlying memory buffer outside the lock phase around the speak call, while onecore manage it internally in the locked phase.

bertfrees commented 8 months ago

Very nice!

bertfrees commented 8 months ago

Before we can merge, we still need to document and test that the order of the getAvailableVoices list has an influence on voice selection.

@NPavie There is a problem. I tested this and it appears the order has no influence on voice selection. So your "System wide selected voices prioritization" commit does not have the desired effect. I will see what I can do.

bertfrees commented 8 months ago

OK, fixed it. See my latest commit.

NPavie commented 8 months ago

Last minor change: I expanded the locking timeout in native code after a test in production of the adapter. I stumbled upon a case where the release of the mutex lock took one or two seconds more than the 3 seconds i defined in the code. (The conversion did not crash, but one sentence was synthesize with a fallback onecore voice immediately after the error arise in the log.)

Just in case, i extended the timeout test to 10 seconds.

I did more tests and it did not happended again so far after this extension of the timeout.

bertfrees commented 8 months ago

OK good! Let me know if it's ready. I will merge soon.

NPavie commented 8 months ago

I think it is finally ready, I don't have crahs anymore on windows 11, performances are a bit lower than before but the impact is contained as much as possible, so I would say go for it :)

bertfrees commented 8 months ago

Merged in https://github.com/daisy/pipeline-modules/commit/6127b826ef