Open lexaux opened 2 years ago
Example of an SSML file below:
<speak
xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="http://www.w3.org/2001/mstts"
xmlns:emo="http://www.w3.org/2009/10/emotionml"
version="1.0" xml:lang="en-US">
<voice name="en-US-JennyNeural"><prosody rate="0%" pitch="0%">
It's time that we went to
<phoneme alphabet="ipa" ph="zɑpoˈriʒʲːɐ"> Zaporizzhia </phoneme>
and <phoneme alphabet="ipa" ph="xerˈsɔn"> Kherson</phoneme>
to use <phoneme alphabet="ipa" ph="dʒæmboks"> Jambox</phoneme>.
</prosody></voice>
</speak>
Which Leads to good results. And in interface this could be:
cc: @konstantin-aa
that looks sweet. though it might be neat to in-line it. you could tag a pronunciation of a word/phrase and then future matches will be phonetic. something like bit-of-text-to-be-pronounced-dfferently <$p> phonetic-spelling-goes-here. And on future occurrences of the spelling, we can just pronounce it that way. We could even make stuff composable, though unsure about the use cases and complexity
nice point @konstantin-aa
Few more thoughts:
It's a complex task that still stays in one spot of the codebase, so it's all good
So we should just be matching the stem? Also, the plan is to make word pronunciations carry over forward regardless if they're defined on top, or in-lined. And unsure about the choice of language, will think about it.
Oh yeah these are all good questions. I don't really know what's the best way from user standpoint. I'd probably expect the pronunciation be changed in entire document if just one phonetics definition exists - for both the words before and after the definition.
There is a repeated problem with names of things, companies, people and products when translated. Phonetics is not always picked up correctly (city names by mr president like Severodonetsk, Zaporizhyie etc, product names - Hairstory, Jamworks). Sometimes this works well, sometimes not.
I suggest adding a special block to the doc preface which would define key-value pairs in one of the phonetics alphabets supported by the SSML implementation of providers.
Google phoneme support: https://cloud.google.com/text-to-speech/docs/ssml#phoneme