Closed TheButlah closed 5 years ago
Note that not having the ability to run with synthesis batch size > 1 means it will take eons to generate ground truth aligned spectrograms for training other models with. Let me know @Rayhane-mamah if you need any additional info from me to get this merged, I'm sure it will make a nice quality of life improvement for others!
Not a good idea. The attention mechanism is location sensitive which means predictions will almost everytime be different from what they should be because of paddings
On Thu, 11 Jul 2019, 17:39 Ryan Butler, notifications@github.com wrote:
Note that not having the ability to run with synthesis batch size > 1 means it will take eons to generate ground truth aligned spectrograms for training other models with. Let me know if you need any additional info from me to get this merged, I'm sure it will make a nice quality of life improvement for others!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Rayhane-mamah/Tacotron-2/pull/402?email_source=notifications&email_token=AIIVFQDAGPGOW6C7ZR3UFQTP66SBXA5CNFSM4H4V2IN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZYB4HA#issuecomment-510664220, or mute the thread https://github.com/notifications/unsubscribe-auth/AIIVFQDNAHIMKCPGEYB5E43P66SBXANCNFSM4H4V2INQ .
This comes as a surprise to me, as usually neural network semantics make the network predictions independent of the batch size. Can you direct me to where in the code causes this?
Is there a way to get around that limitation through changing the implementation, or is it a fundamental issue with LSA?
If you can provide me a starting point or idea on how to fix this, I can try to update my patch
Also why then does hparams.py
have the tacotron_synthesis_batch_size
parameter, if its not possible to do synthesis with batch size greater than 1? If you have multiple GPUS (2 in my case) does that mean that its impossible to train because synthesis batch size must be 1? Sorry for the confusion, LSA and how exactly it works is still a little over my head.
BTW I trained with mask_encoder=True
, and mask_decoder=False
Closing this pull request as no response has been made. I don't understand why my patch doesn't work but I'll take @fatchord's word on this.
Would be nice to explain for others in the future coming across this problem.
tacotron/synthesizer.py
breaks if you are using a synthesis batch size greater than 1. This is becausenp.clip()
is called on a list of mel spectrograms, but it expected an array-like that it can convert to an ndarray. It can't do this, so it throws an error.Making
np.clip()
applied to each individual mel spectrogram in the list prevents this issue.