TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)
https://tensorspeech.github.io/TensorFlowTTS/
Apache License 2.0
3.84k stars 814 forks source link

config length of the melspectrogram input #63

Closed linhld0811 closed 4 years ago

linhld0811 commented 4 years ago

When i training tacotron2 model with my own data, some of mel-spetrogram has timestep > 2000 so I always get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [16,2000,80] vs. [16,2023,80]

I dkn that where to set the timestep to higher value to adapt my data. Thanks

azraelkuan commented 4 years ago

check this: https://github.com/dathudeptrai/TensorflowTTS/blob/493dae65b8b9979b507d87ae8330f956fa5176df/tensorflow_tts/models/tacotron2.py#L723

linhld0811 commented 4 years ago

I set that to another number but still got this error with timestep = 2000

dathudeptrai commented 4 years ago

@linhld0811 try comment out https://github.com/dathudeptrai/TensorflowTTS/blob/master/tensorflow_tts/models/tacotron2.py#L723, it could solve the problem. BTW, i think > 2000 steps is too long, maybe you need consider to move it into evaluation folder if you set "use_fixed_shapes=True" or you can set "use_fixed_shapes=False" to training with dynamic shape, but i'm not sure if dynamic faster than fixed shape, in my experiments fixed shape is 2x faster than dynamic shape on LJSpeech but there is a man report that dynamic shape is faster than fixed shape :D (refer https://github.com/dathudeptrai/TensorflowTTS/issues/34#issuecomment-642309118).