espeak-ng / espeak-ng

eSpeak NG is an open source speech synthesizer that supports more than hundred languages and accents.
GNU General Public License v3.0
4.26k stars 909 forks source link

Stress and pronunciation modifications on-the-fly #1750

Open stllfe opened 1 year ago

stllfe commented 1 year ago

Greetings!

Firstly, I would like to express my gratitude for the development and maintenance of such a valuable tool.

Currently, I am engaged in the development of a TTS system for the Russian language and use espeak-ng for phonemization in the G2P to IPA format. My experiments yield high-quality results, presumably due to the detailed and finely tuned rules embedded within the espeak-ng library.

However, the domain and task conditions of my project require users to be able to independently correct and redefine the stress placement for certain words, including neologisms, professional terms, slang, and so on. In addition, a TTS frontend is under development, which, according to my expectations, will be able to resolve some pronunciation ambiguities, which are abundant in the Russian language. Right now, espeak-ng can sometimes place stress incorrectly. For example:

>> espeak-ng -v ru -X --ipa "На горе стоял роскошный замок"
nə ɡˈorʲi stʌˈjaɭ rʌskˈoʃnyj zamˈok

In this example, there are two homographs: горе (meaning either 'grief' or 'mountain') and замок (meaning either 'lock' or 'castle'). In both cases, the stress is incorrectly placed, causing the TTS system to reproduce the incorrect pronunciation. Essentially, it reads as 'there stood a luxurious lock on the grief' rather than the intended 'there stood a luxurious castle on the mountain', which, of course, is nonsensical.

>> espeak-ng -v ru -X --ipa "На двери висел замок"
nə dvʲˈerʲɪ vʲisʲˈeɭ zamˈok

(same word замок, but the stress is correct now, since it's a different meaning)

From what I understand, the system has been under development for a long time and any modifications could be challenging. However, unlike the earlier issues on the same topic (#439, #1640, #1014), I am not just requesting functionality from the developers, but I am prepared to contribute it myself.

Although my experience with the C/C++ languages is limited and the code is somewhat difficult for me to read, I would appreciate it if the maintainers could guide me on how to add such functionality to the system.

In summary, I have these questions:

Thank you in advance!

jaacoppi commented 1 year ago
  • Is it possible to add pronunciation corrections on the fly for selected words (stress only) without the need to recompile the dictionaries?

No. Such functionality is beyond the scope of the main project and I haven't heard of anyone working on anything like this.

  • If not, how can I perform similar modifications at the dictionary level, considering that I am not a linguist and do not fully understand the current parsing rules?

Learn phonology and other fields of linguistics, Kirshenbaum notation and espeak-ng rules syntax.

For debugging with espeak-ng, use the -X flag. It will show the rules used for choosing the phonemization. The syntax of rules and list files is explained in docs/.

As a general note about Russian, the language is highly dependent on the ru_listx file for hard coding the stress rules. You probably don't need any programming skills. Editing the listx file should be enough for most cases. If a word you think is wrong is already on the list, it might be that the chosen pronunciation is a matter of preference or dialect. Adding missing words is preferred over changing existing ones.

fabiolimace commented 1 month ago

See https://github.com/espeak-ng/espeak-ng/issues/1758#issuecomment-2448957046