Thank you so much for the work you have done in your tacotron implementation. I have a question if you may.
I have a speech corpus with time alignments. For each audio sample, I have a file that looks like this.
0.471000 121 sil
0.618000 121 Z
0.666000 121 i
0.716750 121 n
0.852974 121 a:
0.910125 121 z
0.987444 121 a
1.070000 121 t
1.130000 121 u
1.182000 121 l
What is the best tacotron implementation that can exploit this information?
Thank you so much for the work you have done in your tacotron implementation. I have a question if you may. I have a speech corpus with time alignments. For each audio sample, I have a file that looks like this.
0.471000 121 sil 0.618000 121 Z 0.666000 121 i 0.716750 121 n 0.852974 121 a: 0.910125 121 z 0.987444 121 a 1.070000 121 t 1.130000 121 u 1.182000 121 l
What is the best tacotron implementation that can exploit this information?