NVIDIA / DeepLearningExamples

State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
13.63k stars 3.24k forks source link

Tensorflow Tacotron2 mix precision training error #279

Closed superhg2012 closed 4 years ago

superhg2012 commented 5 years ago

System info:

GPU Type: Tesla T4 Nvidia Driver Version: 418.87.01 CUDA Version: 10.1.243 CUDNN Version: 7.6.3 Python Version (if applicable): 3.7.4 TensorFlow Version (if applicable):1.14.0 Operating System + Version: Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-142-generic x86_64)

Hi, I am training tensorflow version Tacotron2 model with mix precision training. after some training iterations , when doing validation, an error occurs, the detailed error info is like below:

Exiting due to exception: 2 root error(s) found. (0) Invalid argument: TensorArray dtype is float but Op is trying to write dtype half. [[node Tacotron_model/inference/encoder_LSTM/bidirectional_rnn/fw/fw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (defined at /home/yichao.li/lite-tacotron2/tacotron/models/modules.py:225) ]] [[strided_slice_51/_7343]] (1) Invalid argument: TensorArray dtype is float but Op is trying to write dtype half. [[node Tacotron_model/inference/encoder_LSTM/bidirectional_rnn/fw/fw/TensorArrayUnstack/TensorArrayScatter/TensorArrayScatterV3 (defined at /home/yichao.li/lite-tacotron2/tacotron/models/modules.py:225) ]] 0 successful operations. 0 derived errors ignored.

Do I need to turn off auto mixed precision on evaluation time? could you help to clarify this? how to fix?

BiaoLiu2017 commented 4 years ago

Why the issue closed? I met the same issue today. Did anyone solve the problem?