Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.28k stars 905 forks source link

mix precision training error #439

Closed superhg2012 closed 5 years ago

superhg2012 commented 5 years ago

System info:

GPU Type: Tesla T4 Nvidia Driver Version: 418.87.01 CUDA Version: 10.1.243 CUDNN Version: 7.6.3 Python Version (if applicable): 3.7.4 TensorFlow Version (if applicable):1.14.0 Operating System + Version: Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64)

Hi, I am training tensorflow version Tacotron2 model with mix precision training apis like below.

opt = tf.train.AdamOptimizer()
opt = tf.train.experimental.enable_mixed_precision_graph_rewrite(opt)
train_op = opt.miminize(loss)

After some training iterations , when doing validation, an error occurs, the detailed error info is like below:

Exiting due to exception: 2 root error(s) found. (0) Invalid argument: TensorArray dtype is float but Op is trying to write dtype half. [[node Tacotron_model/inference/encoder_LSTM/bidirectional_rnn/fw/fw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (defined at /home/yichao.li/lite-tacotron2/tacotron/models/modules.py:225) ]] [[strided_slice_51/_7343]] (1) Invalid argument: TensorArray dtype is float but Op is trying to write dtype half. [[node Tacotron_model/inference/encoder_LSTM/bidirectional_rnn/fw/fw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (defined at /home/yichao.li/lite-tacotron2/tacotron/models/modules.py:225) ]] 0 successful operations. 0 derived errors ignored.

Do I need to turn off auto mixed precision on evaluation time? could you help to clarify this? how to fix?

venuswu commented 4 years ago

Does the amp speedup your training ? My training the same when I open the amp. @superhg2012