Closed erogol closed 4 years ago
This may be related to #26. I'm looking for a fix. Thanks for reporting!
This is implemented in phonemizer-2.1. Let me know if this is not working as expected.
Now I can get the punctuations but the separator does not work well with it. So here is an example.
seperator = phonemizer.separator.Separator(' |', '', '|')
text = "how are. you today, my friend?"
phonemize(text, separator=seperator, strip=False, njobs=1, backend='espeak', language=language, preserve_punctuation=True)
so it generates
'h|aʊ| |ɑːɹ| |. j|uː| |t|ə|d|eɪ| |, m|aɪ| |f|ɹ|ɛ|n|d| |?'
but I guess it needs to be
'h|aʊ| |ɑːɹ|.| |j|uː| |t|ə|d|eɪ|,| |m|aɪ| |f|ɹ|ɛ|n|d|?'
If it is clear enough
Wel... this differs if you are considering the punctuation mark being part of the word or not... I answered "no" to that question while implementing.
echo "how are. you today, my friend?" | phonemize -p' ' -w'\w ' --preserve-punctuation
h aʊ \w ɑːɹ \w . j uː \w t ə d eɪ \w , m aɪ \w f ɹ ɛ n d \w ?
In the output you expect you want the punctuation sign as the last phoneme of a word... And actually this would give 2 instead of 1:
1. 'h|aʊ| |ɑːɹ|.| |j|uː| |t|ə|d|eɪ|,| |m|aɪ| |f|ɹ|ɛ|n|d|?'
2. 'h|aʊ| |ɑːɹ|. | |j|uː| |t|ə|d|eɪ|, | |m|aɪ| |f|ɹ|ɛ|n|d|?| '
@mmmaat Thx for pondering on this issue.
I don't really see how 1 and 2 are different. Yes I see the additional space, but at least for my usecase (training Text2Speech models), it does not provide any difference.
I 'd just prefer to have punctuations separated from the word. I guess this is also that you prefer.
Yes indeed! So I close the issue, thanks for your feedback.
Here is my code to phonemize the text:
previously it'd return comma for punctuations and I'd fix them with a regex but with the new version, punctuations are totally ignored. Is there anyway to keep the punctuations intact.