begeekmyfriend / tacotron2

Forked from NVIDIA/tacotron2 and merged with Rayhane-mamah/Tacotron-2
BSD 3-Clause "New" or "Revised" License
81 stars 38 forks source link

Questions about training time #5

Closed xw1324832579 closed 4 years ago

xw1324832579 commented 4 years ago

@begeekmyfriend Thank you for your implementation. Recently, i train multi-speaker tts model with your code, and my training process is too slow.Details are as follows. batch_size 32 iter_time:3.292461 How about your training speed? By the way ,I didn't train the model in docker container.Will this affect my training speed significantly?

begeekmyfriend commented 4 years ago

The same as yours. I also doubt why the training speed is so slow on PyTorch compared with that on Tensorflow. I have used kernprof to generate the profile and find out that the major bottleneck lies on decoder_lstm and location_layer which are both APIs provided by PyTorch. I still have no idea about the reason. What is more, the bigger the batch size the less the training speed. That would not happen on Tensorflow 1.x

begeekmyfriend commented 4 years ago

Sorry for my misusage of LSTM in PyTorch. There is a loop in mel spectrogram decoding and on each iteration it is a hop size but not the whole sequence length instead. Therefore we shall apply LSTMCell in such loop rather than LSTM. Now the training speed would be normal for you. https://github.com/begeekmyfriend/tacotron2/commit/d131f466c7626c0b7bfb98bbbcabfe38369aa6bb

begeekmyfriend commented 4 years ago

By the way, if we want to use LSTM we should apply it outside the decoding loop to feed the whole sequences. LSTM would utilize less GPU memory.

begeekmyfriend commented 4 years ago

image