daisy / pipeline-modules

Modules for the DAISY Pipeline project
4 stars 5 forks source link

tts-adapter-sapinative : support for microsoft natural voices over NaturalVoicesSAPIAdapter #110

Closed NPavie closed 3 weeks ago

NPavie commented 3 weeks ago

This PR aims to allow the use of Microsoft natural voices with the sapinative adapter, when exposed to the SAPI TTS engine with the NaturalVoicesSAPIAdapter :

Additionally, the PR also

This PR should fix daisy/pipeline#784

The adapter has been tested in production with the Daisy Pipeline App 1.6.0 beta 1, on the hauy_valid.xml sample, with NaturalVoicesSAPIAdapter version 0.2.

bertfrees commented 3 weeks ago

Merged!

NPavie commented 2 weeks ago

hum that's strange, in my setup it was passing the test. I'll retest with the latest changes and clean maven repo just in case.

NPavie commented 2 weeks ago

Oh yes, I see, I had deactivated the test in my setup but forgot to update it, sorry. Given the ssml tested in here, this should not have impact on the production side.

The ssml tested here has no spaces between tokens (meaning no spaces between word). There are cases where words/tokens can be juxtaposed and should not be separated by a space, or it could break pronunciation. Like <token>d'</token><token>autres</token> or <token>d</token>'<token>autres</token> (i don't remember right now if the tokenizer includes or not the apostrophe in the token) should not have spaces inserted or it will pronounced "D autres" (was the case in production before I removed the space inserted by the stylesheet).

From my tests in production, the pipeline does produce tokenized sentences with spaces correctly preserved between tokens, so I assumed this test was to verify that a problem with the ssml received by the adapter was bypassed at some point.

bertfrees commented 2 weeks ago

So what would be a correct test? Do there need to be spaces between the tokens?

NPavie commented 2 weeks ago

In this particular test, I think so.

bertfrees commented 1 week ago

OK, I'll change that.