DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!
Apache License 2.0
1.4k stars 158 forks source link

Handling of abbreviations/ single letters #145

Closed stlohrey closed 3 months ago

stlohrey commented 1 year ago

Hey, can you maybe explain how abbreviations and single letters (in german) are handled? I am running into the issue that a single "C" like in "ABC" generates the phoneme output [se], while it should be something like [ʦe] ... any hint how to fix that? thanks!

Flux9665 commented 1 year ago

We don't handle these things ourselves, we rely on the phonemizer for this. The phonemizer in question is espeak-ng. So whatever espeak-ng does to transform text into phonemes is how the TTS will pronounce it. You could try adding spaces inbetween the letters, that seems to work pretty well.

If you want to handle some more cases like proper nouns or abbreviations, you can create a function analogous to https://github.com/DigitalPhonetics/IMS-Toucan/blob/b4991d48fc3f6f576f8c937cc117e1cdd923ad55/Preprocessing/TextFrontend.py#L464 and add it to the German text frontend here https://github.com/DigitalPhonetics/IMS-Toucan/blob/b4991d48fc3f6f576f8c937cc117e1cdd923ad55/Preprocessing/TextFrontend.py#L67

By transforming the input text, you can elicit different behaviors in the phonemizer. It does handle a bunch of things already internally, but this is probably the best place to change something if the phonemizer makes a mistake.

stlohrey commented 1 year ago

Already did that, spaces do not help in this case, but you are right, it's a phonemizer-related problem:

>>> import phonemizer
>>> print(phonemizer.phonemize("A B C",language='de'))
ɑː beː seː 

did you already experiment with a different phonemizer? I also did a transform on the input text using https://github.com/repodiac/german_transliterate, which helps with abbreviations and timestrings. But I ran into another issue: When trying to output abbreviations with two or more consecutive similar phonemes (eg IAA or ACE), they are tied together in the output, but there should be something like a glottal stop in between, and again, spaces do not change this behavior. I tried to produce the desired output by adding [ʔ] to the phoneme string, but this resulted in something like an h.

stlohrey commented 1 year ago

Ok, seems like somehow an old espeak version was still installed on my system and phonemizer used this; after uninstalling the "C"-problem is solved. In the process, I checked how your huggingface multilingual demo is handling a single "C" in german, and it seems to have a similar problem...

Flux9665 commented 1 year ago

I did not yet try out different phonemizers, but I believe that the espeak phonemizer is among the best for languages spoken by many people. It's less good for niche languages. You can directly modify the phone string after it was phonemized to handle special cases, other than that, the end-to-end nature of those models just sometimes causes these types of mistakes and there's not really anything that can be done about it.