It's quite common to use spaces to separate the phonemes for speech synthesis.
But this leads to word mismatch problems because count_phonemized splits on whitespace.
>>> from phonemizer.backend import BACKENDS
>>> from phonemizer.separator import Separator
>>> G2P = BACKENDS['espeak'](language='en-us', words_mismatch='warn')
>>> SEP = Separator(word='|', phone=' ')
>>> G2P.phonemize(['try'], separator=SEP)[0]
WARNING:phonemizer:words count mismatch on line 1 (expected 1 words but get 4)
WARNING:phonemizer:words count mismatch on 100.0% of the lines (1/1)
't ɹ aɪ |'
@classmethod
def _count_words(cls, text, wordsep=None):
"""Return the number of words contained in each line of `text`"""
return [
len([w for w in line.strip().split(wordsep) if w])
for line in text]
def count_phonemized(self, text, wordsep=None):
"""Stores the number of words in each output line"""
self._count_phn = self._count_words(text, wordsep)
Note: this still raises warnings when unexpected line splits occur, such as caps in the middle GameStop or nonword chars before punctuation he said--, no. But it should suffice for most cases and the input text should be normalized properly.
It's quite common to use spaces to separate the phonemes for speech synthesis.
But this leads to word mismatch problems because
count_phonemized
splits on whitespace.It seems to be a common issue, e.g. https://github.com/bootphon/phonemizer/issues/154 and https://github.com/lifeiteng/vall-e/issues/50
I have fixed this (per below) but let me know if you need a PR for it.
Fix in
words_mismatch.py
Fix in
espeak.py
:Fix in
base.py
Note: this still raises warnings when unexpected line splits occur, such as caps in the middle
GameStop
or nonword chars before punctuationhe said--, no
. But it should suffice for most cases and the input text should be normalized properly.