Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.28k stars 905 forks source link

Fix synthesis when synthesis batch size > 1 #402

Closed TheButlah closed 5 years ago

TheButlah commented 5 years ago

tacotron/synthesizer.py breaks if you are using a synthesis batch size greater than 1. This is because np.clip() is called on a list of mel spectrograms, but it expected an array-like that it can convert to an ndarray. It can't do this, so it throws an error.

Making np.clip() applied to each individual mel spectrogram in the list prevents this issue.

TheButlah commented 5 years ago

Note that not having the ability to run with synthesis batch size > 1 means it will take eons to generate ground truth aligned spectrograms for training other models with. Let me know @Rayhane-mamah if you need any additional info from me to get this merged, I'm sure it will make a nice quality of life improvement for others!

Rayhane-mamah commented 5 years ago

Not a good idea. The attention mechanism is location sensitive which means predictions will almost everytime be different from what they should be because of paddings

On Thu, 11 Jul 2019, 17:39 Ryan Butler, notifications@github.com wrote:

Note that not having the ability to run with synthesis batch size > 1 means it will take eons to generate ground truth aligned spectrograms for training other models with. Let me know if you need any additional info from me to get this merged, I'm sure it will make a nice quality of life improvement for others!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rayhane-mamah/Tacotron-2/pull/402?email_source=notifications&email_token=AIIVFQDAGPGOW6C7ZR3UFQTP66SBXA5CNFSM4H4V2IN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZYB4HA#issuecomment-510664220, or mute the thread https://github.com/notifications/unsubscribe-auth/AIIVFQDNAHIMKCPGEYB5E43P66SBXANCNFSM4H4V2INQ .

TheButlah commented 5 years ago

This comes as a surprise to me, as usually neural network semantics make the network predictions independent of the batch size. Can you direct me to where in the code causes this?

Is there a way to get around that limitation through changing the implementation, or is it a fundamental issue with LSA?

If you can provide me a starting point or idea on how to fix this, I can try to update my patch

TheButlah commented 5 years ago

Also why then does hparams.py have the tacotron_synthesis_batch_size parameter, if its not possible to do synthesis with batch size greater than 1? If you have multiple GPUS (2 in my case) does that mean that its impossible to train because synthesis batch size must be 1? Sorry for the confusion, LSA and how exactly it works is still a little over my head.

BTW I trained with mask_encoder=True, and mask_decoder=False

TheButlah commented 5 years ago

Closing this pull request as no response has been made. I don't understand why my patch doesn't work but I'll take @fatchord's word on this.

Would be nice to explain for others in the future coming across this problem.