Closed cfrancesco closed 4 years ago
Hi, can I have a complete example of a failing command please, with input text and options?
Ok I understood the bug, it occurs when trying to restore punctuation on an empty text. I'll publish a fix soon. Thanks for reporting.
Don't know if this is related or not, but:
000004280: Hélas! . ni l'un ni l'autre ne ressemblait au sien.
Traceback (most recent call last):
File "/home/muksihs/git/Cherokee-TTS/data/comvoi_ipa/generateTrainingData.py", line 59, in <module>
use_sampa=False)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/phonemize.py", line 172, in phonemize
text, separator=separator, strip=strip, njobs=njobs)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/backend/base.py", line 126, in phonemize
text = self._punctuator.restore(text, punctuation_marks)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 146, in restore
return cls._restore_aux(str2list(text), marks, 0)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
File "/home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages/phonemizer/punctuation.py", line 166, in _restore_aux
[text[0] + m.mark + text[1]] + text[2:], marks[1:], n)
IndexError: list index out of range
pip show phonemizer
Name: phonemizer
Version: 2.1
Summary: Simple text to phones converter for multiple languages
Home-page: https://github.com/bootphon/phonemizer
Author: Mathieu Bernard
Author-email: mathieu.a.bernard@inria.fr
License: GPL3
Location: /home/muksihs/miniconda3/envs/Cherokee-TTS/lib/python3.7/site-packages
Requires: segments, attrs, joblib
Required-by:
Hi, indeed you should upgrade your phonemizer version:
>>> from phonemizer import phonemize
>>> utt = "Hélas! . ni l'un ni l'autre ne ressemblait au sien."
>>> phonemize(utt, backend='espeak', language='fr-fr', preserve_punctuation=True)
'elas ! . ni lœ̃ ni lotʁ nə ʁəsɑ̃blɛt o sjɛ̃ .'
I got the version
$ phonemize --version
phonemizer-2.2.2
available backends: espeak-ng-1.50, espeak-mbrola, festival-2.5.0, segments-2.1.3
I do not have an extensive list, but many double punctuation patterns break the phonemization. One example being
!'
Phonemizer from pip version 2.2