NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

Strange query input shape of multihead attention in STL #77

Closed elch10 closed 3 years ago

elch10 commented 3 years ago

When I try melltoron with different hparams found strange line in STL module: https://github.com/NVIDIA/mellotron/blob/master/modules.py#L94

d_q = hp.token_embedding_size // 2

So, in hparams file I found that ref_enc_gru_size is half of token_embedding_size, so all work normally without changing hparams: https://github.com/NVIDIA/mellotron/blob/master/hparams.py#L97 But If you try to change ref_enc_gru_size to another value script will raise error. Right shape is:

d_q = hp.ref_enc_gru_size
rafaelvalle commented 3 years ago

Good catch, thank you!