espeak-ng / espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
GNU General Public License v3.0
4.2k stars 897 forks source link

Language analysis improvements #199

Open valdisvi opened 7 years ago

valdisvi commented 7 years ago

Language analysis and spelling decisions could be improved by introducing following new features:

rhdunn commented 7 years ago

Using .replace to expand to multiple letters is working for me (i.e. the replace rules in en_rules). Are there specific cases that are not working?

valdisvi commented 7 years ago

Yes, because in compiledict.c bytes are compressed into integer with utf8_in function, and then only these 4 bytes are written with Write4Bytes. That produces wrong result, if there are too many "meaningful" bytes in from or to part of replacement. So, universal .replace implementation requires to replace arbitrary number of from bytes to arbitrary number of to bytes. To test it, just add rule e.g.

.replace
 æ    are
 are  usi
 ša   ra
//etc. with even more bytes in from or to part of replacement