Closed RhyanJohnson closed 4 years ago
Thanks for reporting that, I'm investigating...
Thank you! I am as well. It's seems to be coming from espeak first, as a result of
command = '{} -v{} {} -q -f {} {}'.format(self.espeak_exe(), self.language, self.ipa, data.name, self.sep)
line = subprocess.check_output(shlex.split(command, posix=False)).decode('utf8')
before we swap the espeak
separator with the one provided to phonemize
.
Perhaps I should be reporting this to espeak/espeak-ng instead?
Indeed !
$ echo "The lion and the tiger ran" | espeak-ng -x --ipa -q --sep=_
ð_ə l_ˈaɪə_n__ a_n_d ð_ə t_ˈaɪ_ɡ_ə ɹ_ˈa_n
Ok you report to espeak-ng as well, for now I'll add some fix in the phonemizer code.
Sounds good - thanks very much for your promptness!
Works for me - thanks for the quick fix! Happy to help (:
results in
with two separators attached to the end of the phonemized 'lion'.
As opposed to
resulting in
without trailing separators.
I noticed this around the following samples as well:
the hello but the
givesð*ə h*ə*l*oʊ** b*ʌ*t ð*ə
Here there and everywhere
givesh*ɪɹ ð*ɛɹ** æ*n*d ɛ*v*ɹ*ɪ*w*ɛɹ
He was hungry and tired.
givesh*iː w*ʌ*z h*ʌ*ŋ*ɡ*ɹ*i** æ*n*d t*aɪɚ*d
He was hungry but tired.
givesh*iː w*ʌ*z h*ʌ*ŋ*ɡ*ɹ*i** b*ʌ*t t*aɪɚ*d
The tiger or the lion
givesð*ə t*aɪ*ɡ*ɚ** ɔːɹ ð*ə l*aɪə*n
The lion or the tiger
givesð*ə l*aɪə*n** ɔːɹ ð*ə t*aɪ*ɡ*ɚ
I noticed it around conjunctions like 'and', 'but, and 'or', but not always:
Lions and tigers and bears, oh my!
givesl*aɪə*n*z æ*n*d t*aɪ*ɡ*ɚ*z** æ*n*d b*ɛɹ*z oʊ m*aɪ
Lions and tigers run together
givesl*aɪə*n*z æ*n*d t*aɪ*ɡ*ɚ*z ɹ*ʌ*n t*ə*ɡ*ɛ*ð*ɚ