Open fquirin opened 2 years ago
I was wondering where one can edit the rule that is responsible for this behavior.
Well - this is not a bug in txt2pho...
The programs in this repo serve very different purposes. txt2pho is responsible for converting text to phonems - just plain text, no complex numbers, fractions, times, dates or currencies since that is a completely different kind of problem. This latter problem is tackled by the preprocessor preproc.
At first glance both problems seem to be quite easy to solve. As always with natural languages both become much harder the closer one looks at them.
Take your example
echo "Heute ist der 19.12.2021" | ./preproc data/PPRules/rules.lst data/hadifix.abk
gets translated to
Heute ist der 19 12 zwei tausend einundzwanzig
which means that the ordinals are not spoken correctly. OK, let's fix rules.lst
so we get
Heute ist der 19n 12n zwei tausend einundzwanzig
which is better but still incorrect since in German ordinals are declined in the sentence. So we need an algorithm which understands the different parts of language in a sentence and translates "19." into "19n", "19te" or "19ter" respectively.
Besides that there is a light problem with numbers at the end of a sentence, because they will always be spoken as ordinals:
Yes, that is basically a similar problem. The preprocessor must be able to understand the meaning of "40." at the end of the sentence.
At the moment no-one has volunteered to write such an elaborated version of preproc...
PS: If you have a text output coming from an another programs it's rather easy to ensure the correctness of the spoken output:
Heute ist der 19te 12te 2021.
I'm aware of the complexity but I'm confused why 19.12.2021
becomes neunzehnte
and that's it. The rest is removed completely! And I'm not even using preproc (see example above).
I'm catching some edge cases already in the assistant TTS preprocessor (e.g. 10:30 Uhr -> 10 Uhr 30), but as you know dates are extremely messy in German :grimacing: ... so I was hoping to get "neunzehn punkt zwölf punkt zweitausendeinundzwanzig" for now from txt2pho as in espeak for example.
I'm aware of the complexity but I'm confused why 19.12.2021 becomes neunzehnte and that's it.
OK, let's focus on txt2pho...
echo "Heute ist der 19.12.2021"|./txt2pho | mbrola -e /usr/share/mbrola/de2/de2 - test.wav
becomes
Heute ist der 19 Punkt 12 Punkt 2021
or
_ 10 0 86
h 81 23 88 48 89 73 91 98 92
OY 121 15 94 31 96 48 98 64 100 81 101 98 102
t 83 14 104 39 104
@ 58 24 104 59 104 93 104
_ 41 39 103 88 103
I 46 33 103 76 102
s 69 13 102 42 101 71 101 100 100
t 70 29 98
d 48
e: 57 4 96 39 95 74 95
6 70 7 94 36 96 64 98 93 100
n 53 28 101 66 102
OY 109 2 103 20 103 39 103 57 103 75 103 94 103
n 56 23 102 59 102 95 102
t 66 5 101 35 100
s 63 17 100 49 99 81 99
e: 56 14 98 50 97 86 97
n 57 21 96 56 95 91 94
p 92 15 92 37 91
U 66 5 91 35 93 65 95 95 97
N 60 28 98 62 100 95 101
k 63 3 102 35 103
t 52 17 102
s 57 16 101 51 101 86 100
v 37 32 100 86 99
9 64 23 98 55 98 86 97
l 61 18 97 51 99 84 100
f 57 18 101 53 102 88 102
p 98 7 102 28 102 48 101
U 66 23 101 53 101 83 100
N 61 15 100 48 99 80 100
k 64 20 102
t 54 33 102
s 58 29 100 64 100 98 99
v 33 58 99
aI 119 5 98 22 97 39 97 55 96 72 95 89 95
t 68 15 98 44 99
aU 84 23 98 46 98 70 98 94 98
z 34 44 97
E 57 2 97 37 96 72 96
n 49 8 95 49 95 90 94
d 49 2 92
aI 100 6 92 26 91 46 90 66 90 86 89
n 20 30 88
U 51 12 88 51 87 90 87
n 52 29 86 67 86
t 46 37 85
s 46 13 85 57 85 100 85
v 15
a 75 7 85 33 85 60 84 87 84
n 48 21 83 62 83
t 39 5 81
s 49 6 81 47 80 88 79
I 43 33 79 79 78
C 63 17 77 49 77 81 76
s 60 13 76 47 76 80 76
_ 483 2 85 6 85 10 85 14 85 18 85 22 85 27 85 31 85 35 85 39 85
Ah sorry, there was a dot missing. Try this:
echo "Heute ist der 19.12.2021."|./txt2pho | mbrola -e /usr/share/mbrola/de2/de2 - test.wav
Yes - can see the problem now.
I'm wondering what the cause for this strange behaviour is.
I thought it was a rule defined somewhere since it actually transforms 19.
to ordinal. Maybe it fails to handle 2021.
but then I'd expect to hear 12.
at least.
Any files I could check for ordinal transformation?
Yes - can see the problem now. I'm wondering what the cause for this strange behaviour is.
Did you find out anything new about this? I've seen there were some recent commits related to dates :slightly_smiling_face:
@GHPS I've integrated txt2pho in the latest SEPIA-Home release :slightly_smiling_face: . Here are instructions to install it.
I really like the voices but from time to time I find some strange artifacts (that don't appear in espeak or default MBROLA). For example if you ask SEPIA for the date you will get the answer "Heute ist der 12.05.2022" but what you hear is really weird: "Heute ist der zwölft punkt null fünf null zwei null zwei zwei punkt zweitausendzweiundzwanzig" :sweat_smile: :see_no_evil: .
My "speak" script looks like this (arguments: gender, voice, text):
echo "$3" | iconv -cs -f UTF-8 -t ISO-8859-1 | ./txt2pho "-$1" | mbrola /usr/share/mbrola/"$2"
@GHPS I've integrated txt2pho in the latest SEPIA-Home release slightly_smiling_face . Here are instructions to install it.
Great - SEPIA is a very promissing project. I'll link to the instructions in the readme of this project.
Concerning the pronunciation issue: The log-files should give some insight what is going on/wrong.
I'll take a deeper look into the code in the next week...
Great, thanks! :slightly_smiling_face: I'll try to fix the problem with dates in SEPIA's own TTS pre-processor in the meantime. It seems German dates have been a pain for TTS since the dawn of time =)
Hi @GHPS
I found another issue with the pronunciation, again related to "." after numbers 😢.
echo "Licht steht auf 70." | iconv -cs -f UTF-8 -t ISO-8859-1 | ./txt2pho -m | mbrola /usr/share/mbrola/de3/de3 - test.wav
The "70" will not be spoken at all. It works when I remove the "." at the end.
Thanks for the information.
The "70" will not be spoken at all. It works when I remove the "." at the end.
That is in principle the same problem as discussed above: txt2pho converts a stream of text to phonems - but has no concepts for parts of speech or even complete sentences. In this context the character string "70." has no meaning since it is no word or a correct German number. It is therefore ignored.
That is why the preprocessor is necessary. It uses a number of heuristics to decide whether "70." means "siebzigster" oder "siebzig" at the end of the sentence. It even understands constructs like "70.000".
In short: Use preproc to convert numbers or whole sentences before sending the stream to txt2pho.
In short: Use preproc to convert numbers or whole sentences before sending the stream to txt2pho
The preprocessor has unfortunately some weird behavior as well :-/ for example:
echo "Der 70. Geburtstag ist am 01.01.2023" | iconv -cs -f UTF-8 -t ISO-8859-1 | ./preproc -r data/preproc.rls -a data/preproc.abk
-> Der siebzigste Geburtstag ist am 01n 01n zwei tausend dreiundzwanzig
I initially removed the preprocessor because I'm doing my own processing first, but it may still be the better option compared to loosing numbers completely 😅
I've been experimenting with txt2pho and MBROLA and noticed something odd:
The sentence
Heute ist der 19.12.2021
will abort after19.
(pronounced "neunzehnte") :-/.I'm using this command:
echo "Heute ist der 19.12.2021" | iconv -cs -f UTF-8 -t ISO-8859-1 | ./txt2pho -m | mbrola /usr/share/mbrola/de2/de2" - test.wav
I was wondering where one can edit the rule that is responsible for this behavior.
Besides that there is a light problem with numbers at the end of a sentence, because they will always be spoken as ordinals: "Er wurde Heute 40.". Though without context it is indeed unclear if "Er wurde Heute vierzig" or "Er wurde Heute Vierzigster" is the right version ^^.
Btw I'm avoiding 'preproc' because it has it's own set of issues :sweat_smile: :see_no_evil: