barronalex / Tacotron

Implementation of Google's Tacotron in TensorFlow
236 stars 80 forks source link

modify concate axis to 2 #17

Closed candlewill closed 7 years ago

candlewill commented 7 years ago

I think it's more reasonable to concate highway input with speaker embedding at the last axis. Is this right?

barronalex commented 7 years ago

Ah yes -- thanks for catching that!

In addition, I'm still not sure what dimension the concatenated speaker embedding should be since the paper states "we use one site-specific embedding as an extra input to each highway layer at each timestep".

Do you think that means we project the 16-dim speaker embedding to a single value? Currently I have it projected to the same size as the highway layer, but that's pretty arbitrary.

candlewill commented 7 years ago

Yes, I think it's better to project the speaker embedding to just one single tensor. And then, the highway layer input concates with this tensor at each layer.

From deep voice 2 paper Fig. 3, we can find that there's just 1 FC layer (not 4) before added to highway layer input.