Add a way to define phonetics for specific words in dub

lexaux commented 2 years ago

There is a repeated problem with names of things, companies, people and products when translated. Phonetics is not always picked up correctly (city names by mr president like Severodonetsk, Zaporizhyie etc, product names - Hairstory, Jamworks). Sometimes this works well, sometimes not.

I suggest adding a special block to the doc preface which would define key-value pairs in one of the phonetics alphabets supported by the SSML implementation of providers.

Google phoneme support: https://cloud.google.com/text-to-speech/docs/ssml#phoneme

lexaux commented 2 years ago

Example of an SSML file below:

<speak 
   xmlns="http://www.w3.org/2001/10/synthesis" 
   xmlns:mstts="http://www.w3.org/2001/mstts" 
   xmlns:emo="http://www.w3.org/2009/10/emotionml" 
   version="1.0" xml:lang="en-US">
<voice name="en-US-JennyNeural"><prosody rate="0%" pitch="0%">

It's time that we went to 
  <phoneme alphabet="ipa" ph="zɑpoˈriʒʲːɐ"> Zaporizzhia </phoneme> 
  and  <phoneme alphabet="ipa" ph="xerˈsɔn"> Kherson</phoneme> 
  to use <phoneme alphabet="ipa" ph="dʒæmboks"> Jambox</phoneme>.

</prosody></voice>
</speak>

Which Leads to good results. And in interface this could be:

astaff commented 2 years ago

cc: @konstantin-aa

konst-aa commented 2 years ago

that looks sweet. though it might be neat to in-line it. you could tag a pronunciation of a word/phrase and then future matches will be phonetic. something like ~~bit-of-text-to-be-pronounced-dfferently <$p> phonetic-spelling-goes-here~~. And on future occurrences of the spelling, we can just pronounce it that way. We could even make stuff composable, though unsure about the use cases and complexity

lexaux commented 2 years ago

nice point @konstantin-aa

Few more thoughts:

Choice of language - there is IPA (not that one) but also X-Sampa at the very least. Question here what people would be OK using - one requires using special symbols but another is a bit cryptic.
We probably want to define phonetics once for the text and not repeat it for each word. However, words come with variations. It is less so in English but more so in Ukrainian and Russian (all the suffixes, prefixes etc). We would probably want to match not just word-for-word, but stem + repeat. Does this match with phoneme definition? Idk. Feels like a bit more complex task.

konst-aa commented 2 years ago

It's a complex task that still stays in one spot of the codebase, so it's all good

konst-aa commented 2 years ago

So we should just be matching the stem? Also, the plan is to make word pronunciations carry over forward regardless if they're defined on top, or in-lined. And unsure about the choice of language, will think about it.

lexaux commented 2 years ago

Oh yeah these are all good questions. I don't really know what's the best way from user standpoint. I'd probably expect the pronunciation be changed in entire document if just one phonetics definition exists - for both the words before and after the definition.

code-anyway / freespeech

Add a way to define phonetics for specific words in dub #71