Global control over word pronunciation

GoogleCodeExporter commented 8 years ago

Some speech engines mangle particular words or read a particular word 
completely wrong. It would be nice if the designer could specify these 
mispronounced words and their phonetic replacements in a single file.

Each string sent to the speech engine could be checked against the lookup table 
file.

Whatever algorithm we use, it should be reasonably efficient, just in case the 
file gets large. I have found a few discussions for thought:
http://stackoverflow.com/questions/2190493/efficient-method-to-replace-multiple-
words-in-text
http://blog.stevenlevithan.com/archives/multi-replace

We probably do not want an algorithm that searches the whole string N times for 
each of N words in the lookup file. The algorithm should probably split and 
replace exact words (using word boundaries), so that "abc" would match " abc " 
and " abc," and " abc!" but not " abcd " or " zabc ".

My proposal is to have a JSON file for pronunciation. I have attached two 
example files: one for words and one for chars. 

The char file has more entries and its structure is sorted by Type-of-char, 
spoken-language, pronunciation-type, and then the hash table pairs. The 
pronunciation-type (in this case always "ascii") gives the encoding of the 
pronunciation (this would be useful if you could send International Phonetic 
Alphabet [IPA] pronunciation to a speech engine instead of the sort-of-faked 
ASCII pronunciation). The pronunciation might vary by spoken-language (e.g., 
zee vs. zed). If a spoken language wasn't in the JSON, then you would use the 
first language in the proper section. The word JSON file does not have a 
Type-of-char sort.

Would you make any changes to the JSON format?

Original issue reported on code.google.com by jbjor...@gmail.com on 5 Jun 2013 at 7:24

Attachments:

GoogleCodeExporter commented 8 years ago

Let me know if my JSON needs some fixing please!

Original comment by jbjor...@gmail.com on 5 Jun 2013 at 7:51

GoogleCodeExporter commented 8 years ago

It looks fine. But honestly, Google's Speech API is very good at this stuff. 
Have you found any situations where it mispronounces things?

Original comment by aeharding on 6 Jun 2013 at 3:00

GoogleCodeExporter commented 8 years ago

Yes, usually speech engines are very good at this, but they are consistent when 
they mangle a word. I have also found that different SAPI 5 voices can have 
different peculiarities (meaning that the JSON file might be somewhat voice 
specific). Often the mangled words are proper nouns, for example EZ Access 
should be read as "EE ZEE Access" rather than "ehz Access."

Hopefully the JSON file will usually be short.

This is a lower priority than getting the DOM parsing and navigating overhaul 
done.

Original comment by jbjor...@gmail.com on 6 Jun 2013 at 2:05

GoogleCodeExporter commented 8 years ago

I've found that the default Chrome voice pronounces 'EZ Access' better than 'ee 
zee access'

The voice changes the synthesis, but isn't it consistent where if the engine 
recognizes 'EZ' as ee zee it will be the same across voices (not ehz)?

Original comment by aeharding on 6 Jun 2013 at 4:04

GoogleCodeExporter commented 8 years ago

At least on Windows with SAPI 5 voices, it seems like the voice has some "say" 
over how something is read. For example, the default Windows 7 voice says "vee" 
if the highlight is on 'V', whereas a voice called Neospeech Paul says "five" 
if the highlight is on 'V'.

The word pronunciation file could be used as a "global find and say instead" 
function for particular words (and especially proper nouns). It might depend on 
the voice in some cases.

You are right though, 'EZ Access' seems to be read correctly at least with the 
few voices I have tried. It might be good to include a different example word 
in the pronunciation lookup file... any ideas?

Maybe?
"json": "jay sawn"

Original comment by jbjor...@gmail.com on 6 Jun 2013 at 5:46

GoogleCodeExporter commented 8 years ago

json is also pronounced correctly (as jay sawn) in my testing.

I can see this useful for Roman Numerals, but that's all right now. If we find 
things poorly pronounced, we can always add other find-and-replace rules after 
the roman numeral rules.

It is a good idea to allow this flexibility for designers.

Original comment by aeharding on 6 Jun 2013 at 6:12

GoogleCodeExporter commented 8 years ago

I talked to DK who does our EZ Access coding in toolbook and he said that he 
has a global search-replace-and-say function. It is only used for a few words 
at a time.

I wrote up a function for this in r161. It needs to integrated in the speech 
process yet. Also a dictionary file should be created and referenced (simpler 
than the ones I uploaded earlier).

Original comment by jbjor...@gmail.com on 6 Jun 2013 at 8:29

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Oops, Bern, I missed in your JSON file, you included a comma after the last 
item. This is a common mistake. Sorry I didn't call it out before.

Check out JSONlint.com to check JSON code.

Also, r162 makes much progress on integrating it all. Check ci comment for what 
I did (lmk if you want different) and there's also a little potential problem 
with your search-and-replace function that you might want to look into; 
otherwise I can.

Original comment by aeharding on 7 Jun 2013 at 6:15

GoogleCodeExporter commented 8 years ago

Thanks for fixing the JSON issue.

If you wouldn't mind looking at the search-and-replace function to find the 
problem. I did some basic testing myself, but haven't used a JSON file for 
testing (I just put the dictionary inline before the function's code).

You may wish to check that words separated by punctuation (and not spaces) are 
replaced correctly. The code also should only replace whole words, so key=foo 
does not match "foobar" or "foos".

Original comment by jbjor...@gmail.com on 7 Jun 2013 at 3:18

GoogleCodeExporter commented 8 years ago

They are replacing properly with punctuation

For foo = testing:

Hello foo ==> Hello testing
Hello "foo". ==> Hello "testing".
Hellofoo ==> Hellofoo

The only thing not working is capitalization.

Original comment by aeharding on 7 Jun 2013 at 5:40

GoogleCodeExporter commented 8 years ago

Anything remaining on this, Bern? I don't remember if the capitalization thing 
was ever fixed. Should look into that. Forgot how the whole dictionary works, 
heh.

Original comment by aeharding on 27 Aug 2013 at 5:40

GoogleCodeExporter commented 8 years ago

I think this can be closed after testing. The capitalization of the output 
shouldn't matter because it is being sent to the speech engine and not 
displayed. I think that the dictionary search-and-replace should be case 
insensitive if that is easy to do.

The pronunciation JSON files can be removed from the project, except whatever 
you are using.

Original comment by jbjor...@gmail.com on 27 Aug 2013 at 7:27

cloudbearings / ez-access-web

Global control over word pronunciation #35