gexgd0419 / NaturalVoiceSAPIAdapter

Make Azure natural TTS voices accessible to any SAPI 5-compatible application.
MIT License
52 stars 4 forks source link

does ssml language work for multilingual voices? #2

Open vortex1024 opened 1 month ago

vortex1024 commented 1 month ago

for the online multilingual voices, microsoft recommends using ssml to force the language when it is not detected right. does your project support that? I could not make this work, for example, with the brian multilingual voice selected in the tts application supplied in the packge, with process xml on:

it speaks French, not Romanian. I also tried enclosing the text in a element, setting its lang attribute, but no go. Thanks.

gexgd0419 commented 1 month ago

Unfortunately, Microsoft Edge online voices only support a very limited subset of SSML. <lang> tags are not supported.

Also, any unsupported SSML tag will make the server throw an "SSML is invalid" error and close the connection. So this engine has to filter out all SSML tags except a few supported ones, such as <prosody>, before sending the SSML to the Edge voice server.

The Edge voice server requires an xml:lang attribute on the root <speak> element. But changing it seems to do nothing.

So no, changing the language is not supported by Edge voices.


But if you have an Azure Speech subscription key, you can use the Azure voices, which supports that feature.

Currently this engine does not enumerate Azure voices, so if you want to use an Azure voice, you will have to add it manually to the registry.

In registry editor, create a registry key under HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\Tokens. Then, create the following keys & values inside this key:

Check this for a list of Edge online voice names ("ShortName").

vortex1024 commented 1 month ago

thanks for the detailed explanation. it is a shame this does not work. the only way I imagine it could be made to work for free is always passing in some unique string to that language, and then cutting the correspondent audio from the resulting wav