as-ideas / TransformerTTS

🤖💬 Transformer TTS: Implementation of a non-autoregressive Transformer based neural network for text to speech.
https://as-ideas.github.io/TransformerTTS/
Other
1.13k stars 227 forks source link

No numbers in phonemes set and collapse of whitespaces #105

Open anh opened 3 years ago

anh commented 3 years ago

When using phonemizer (espeak-ng) there are digits to reflex the vowel/sound variants like the following:

text = 'Có lối ra, chúng ta qua đó xem sao.'
phonemizer.phonemize(
    text,
    language='vi',
    backend='espeak',
    strip=False,
    preserve_punctuation=True,
    punctuation_marks=';:,.!?¡¿—…"«»“”',
    with_stress=True,
    language_switch='keep-flags',
    njobs=1
)

output:

'ɡˈɔɜ lˈoɪɜ zˈaː7 , tɕˈuɜŋ t̪ˈaː1 wˈaː1 ɗˈɔɜ sˈɛ1m ʂˈaːʊ7 .'

with tokenizer._postprocess:

text = ''.join([c for c in text if c in all_phonemes]) # --> will remove numbers which are not in phonemes set 
text = _collapse_whitespace(text)

output:

ɡˈɔɜ lˈoɪɜ zˈaː,tɕˈuɜŋ tˈaː wˈaː ɗˈɔɜ sˈɛm ʂˈaːʊ.

Outputs placed together:

ɡˈɔɜ lˈoɪɜ zˈaː7 , tɕˈuɜŋ t̪ˈaː1 wˈaː1 ɗˈɔɜ sˈɛ1m ʂˈaːʊ7 .'
ɡˈɔɜ lˈoɪɜ zˈaː,tɕˈuɜŋ tˈaː wˈaː ɗˈɔɜ sˈɛm ʂˈaːʊ.

My question is the missing of numbers (here 7, 1) and spaces surround punctuation like comma as in zˈaː,tɕˈuɜŋ tˈaː instead of zˈaː7 , tɕˈuɜŋ t̪ˈaː1 will affect the aligment and pause beetween generated words?

cfrancesco commented 3 years ago

Hi, the whitespace collapse is a wanted effect, mostly to be able to control where the pauses are allocated with the forward model. You can remove this if you want by removing it from line 91 in data/text/tokenizer.py (return the line above). But I would discourage that, unless you're running into problems. For the numbers issue, you can add the missing phonemes (for instance 1,2,3,4,5,,6,7,8,9,0) in data/text/symbols.py in all phonemes like so: all_phonemes = sorted(list(_phonemes) + list(_punctuations) + list('1234567890') I was not aware that some languages had numbers as phonemes.

TODO: Add optional extra phonemes string to data_config.yaml

anh commented 3 years ago

Thank you for your clarification and making phonemes configurable is super helpful. I'll try your suggestion.