Closed gitmikoy closed 6 years ago
@gitmikoy
The simplest way is to train a new model on the data with numbers and its pronunciations in train dictionary. In this case you have to add to your train dictionary as much as possible examples of numbers and its pronunciations:
1 W AH N
2 T UW
35 TH ER T IY F AY V
796 S EH V AH N HH AH N D R AH D AH N D N AY N T IY S IH K S
...
But, keep in mind, that there exists limit in decoding sequence length (by default, this parameter max_length=30). The longer maximum sequence is the worse decoding performance is.
A little bit complicated, but right way to solve this problem: pre-process numbers before transmit it to inference. You need to implement the module that transmits all the numbers (integers, ordinals, fractional numbers) into its spellings in your language. In this case, your model will be way more reliable and accurate. Also, you don't need to add to your dictionary all possible integers:
one W AH N
two T UW
thirty TH ER T IY
hundred HH AH N D R AH D
thousand TH AW Z AH N D
In first solution, the trained model will be less reliable, because completely different (in pronunciation) numbers may differ from each other only in one character:
25 T W EH N T IY F AY V
15 F IH F T IY N
How to generate dictionary from numbers? example.
g2p-seq2seq --interactive --model_dir my/model
outputs: Invalid Symbol.