code-anyway / freespeech

Other
3 stars 0 forks source link

Add a way to define phonetics for specific words in dub #71

Open lexaux opened 2 years ago

lexaux commented 2 years ago

There is a repeated problem with names of things, companies, people and products when translated. Phonetics is not always picked up correctly (city names by mr president like Severodonetsk, Zaporizhyie etc, product names - Hairstory, Jamworks). Sometimes this works well, sometimes not.

I suggest adding a special block to the doc preface which would define key-value pairs in one of the phonetics alphabets supported by the SSML implementation of providers.

Google phoneme support: https://cloud.google.com/text-to-speech/docs/ssml#phoneme

lexaux commented 2 years ago

Example of an SSML file below:

<speak 
   xmlns="http://www.w3.org/2001/10/synthesis" 
   xmlns:mstts="http://www.w3.org/2001/mstts" 
   xmlns:emo="http://www.w3.org/2009/10/emotionml" 
   version="1.0" xml:lang="en-US">
<voice name="en-US-JennyNeural"><prosody rate="0%" pitch="0%">

It's time that we went to 
  <phoneme alphabet="ipa" ph="zɑpoˈriʒʲːɐ"> Zaporizzhia </phoneme> 
  and  <phoneme alphabet="ipa" ph="xerˈsɔn"> Kherson</phoneme> 
  to use <phoneme alphabet="ipa" ph="dʒæmboks"> Jambox</phoneme>.

</prosody></voice>
</speak>

Which Leads to good results. And in interface this could be: image

astaff commented 2 years ago

cc: @konstantin-aa

konst-aa commented 2 years ago

that looks sweet. though it might be neat to in-line it. you could tag a pronunciation of a word/phrase and then future matches will be phonetic. something like bit-of-text-to-be-pronounced-dfferently <$p> phonetic-spelling-goes-here. And on future occurrences of the spelling, we can just pronounce it that way. We could even make stuff composable, though unsure about the use cases and complexity

lexaux commented 2 years ago

nice point @konstantin-aa

Few more thoughts:

konst-aa commented 2 years ago

It's a complex task that still stays in one spot of the codebase, so it's all good

konst-aa commented 2 years ago

So we should just be matching the stem? Also, the plan is to make word pronunciations carry over forward regardless if they're defined on top, or in-lined. And unsure about the choice of language, will think about it.

lexaux commented 2 years ago

Oh yeah these are all good questions. I don't really know what's the best way from user standpoint. I'd probably expect the pronunciation be changed in entire document if just one phonetics definition exists - for both the words before and after the definition.