inference issue HiFi-GAN -gibberish output

abaddon-moriarty commented 2 years ago

Hi,

I have been trying to hear my french model, but I still cannot get a good output when running inference with Tacotron 2 and HiFi-GAN, I get gibberish instead.

When I run inference, the following line arises, I think it is refering to what is causing the problem, but I don't know what to do with it. full log at the end

2022-01-03 09:22:50.374907: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 6042681344 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }

I saw that the following issue was similar but no ideas were given on how to correct it or what was the problem. Does anybody have a a clearer view on this?

2.6.0

2022-01-03 09:22:28.766510: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-01-03 09:22:31.464977: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 5762 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5 2022-01-03 09:22:31.466136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 13803 MB memory: -> device: 1, name: Tesla T4, pci bus id: 0000:00:1c.0, compute capability: 7.5 2022-01-03 09:22:31.467262: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 13803 MB memory: -> device: 2, name: Tesla T4, pci bus id: 0000:00:1d.0, compute capability: 7.5 2022-01-03 09:22:31.468359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 13803 MB memory: -> device: 3, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5 2022-01-03 09:22:35.208964: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2) 2022-01-03 09:22:36.672638: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8101 2022-01-03 09:22:37.517342: W tensorflow/stream_executor/gpu/asm_compiler.cc:113] WARNING You are using ptxas 9.1.108, which is older than 9.2.88. ptxas 9.x before 9.2.88 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

You do not need to update to CUDA 9.2.88; cherry-picking the ptxas binary is sufficient. 2022-01-03 09:22:37.519005: W tensorflow/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 7.5 2022-01-03 09:22:37.519018: W tensorflow/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas 2022-01-03 09:22:37.519084: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Unimplemented: ptxas ptxas too old. Falling back to the driver to compile. Relying on driver to perform ptx compilation. Modify $PATH to customize ptxas location. This message will be only logged once. hifigan_conf loaded

/usr/local/TensorFlowTTS/examples/hifigan/exp/train.siwis.phoneme.hifigan.v1/checkpoints/generator- + 3520000.h5

Model: "tf_hifi_gan_generator"

Layer (type) Output Shape Param #

sequential (Sequential) (None, None, 1) 13926017

Total params: 13,926,017 Trainable params: 13,926,017 Non-trainable params: 0

2022-01-03 09:22:50.374907: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 6042681344 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } } Model: "tacotron2"

Layer (type) Output Shape Param #

encoder (TFTacotronEncoder) multiple 8218112

decoder_cell (TFTacotronDeco multiple 18246402

post_net (TFTacotronPostnet) multiple 5460480

residual_projection (Dense) multiple 41040

Total params: 31,966,034 Trainable params: 31,955,794 Non-trainable params: 10,240

[131, 139, 106, 103, 25, 139, 129, 10, 96, 145, 10, 106, 139, 107, 10, 89, 114, 25, 75, 10, 106, 139, 10, 97, 145, 19, 10, 80, 25, 143, 144, 6, 10, 143, 144, 99, 10, 75, 10, 97, 139, 106, 25, 84, 10, 107, 108, 129, 99, 25, 84, 10, 97, 145, 19, 10, 113, 100, 96, 100, 83, 25, 141, 129, 10, 103, 137, 144, 83, 25, 137, 144, 10, 107, 108, 107, 10, 97, 84, 19, 10, 113, 75, 96, 25, 137, 144, 106, 10, 107, 116, 10, 75, 118, 10, 139, 144, 107, 84, 129, 25, 139, 10, 75, 10, 103, 75, 129, 97, 25, 84, 106, 92, 99, 25, 143, 144, 147]

Thank you

ZDisket commented 2 years ago

@abaddon-moriarty Looks like a CUDA/CuDNN issue. Versions? Run nvidia-smi and show the output, and follow this tutorial. that ptxas error in general may be problematic

abaddon-moriarty commented 2 years ago

Hi @ZDisket, thank you for the reply.

This is the output I get with nvidia-smi command:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | | N/A 73C P0 76W / 70W | 8044MiB / 15109MiB | 90% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | | N/A 28C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | | N/A 28C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 27C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 34257 C python 8041MiB | +-----------------------------------------------------------------------------+

abaddon-moriarty commented 2 years ago

I have tried following the tutorial you linked @ZDisket , I get stuck at this command cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2 the file exists but does not have MAJOR and MINOR details. but I did run these commands, cuda and cudnn seem to be installed correctly

check libcuda
        libcudart.so.11.0 -> libcudart.so.11.2.72
        libcudart.so.9.1 -> libcudart.so.9.1.85
        libcuda.so.1 -> libcuda.so.460.27.04
libcuda is installed

check libcudart
        libcudart.so.11.0 -> libcudart.so.11.2.72
        libcudart.so.9.1 -> libcudart.so.9.1.85
libcudart is installed

check libcudnn
        libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.1.1
        libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.1.1
        libcudnn.so.8 -> libcudnn.so.8.1.1
        libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.1.1
        libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.1.1
        libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.1.1
        libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.1.1
libcudnn is installed

ZDisket commented 2 years ago

@abaddon-moriarty The issue might be caused by MultiGPU. Whatever script you use to run inference you might want to run by prepending CUDA_VISIBLE_DEVICES=0 before the command

abaddon-moriarty commented 2 years ago

@ZDisket I ran synthesis by adding this in the command. The same error arises, when using only one GPU.


2.6.0

2022-01-05 11:28:17.593128: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-05 11:28:18.182107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13803 MB memory:  -> device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5
2022-01-05 11:28:21.817079: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-01-05 11:28:22.905813: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8101
2022-01-05 11:28:23.443034: W tensorflow/stream_executor/gpu/asm_compiler.cc:113] *** WARNING *** You are using ptxas 9.1.108, which is older than 9.2.88. ptxas 9.x before 9.2.88 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.

You do not need to update to CUDA 9.2.88; cherry-picking the ptxas binary is sufficient.
2022-01-05 11:28:23.444757: W tensorflow/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 7.5
2022-01-05 11:28:23.444774: W tensorflow/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
2022-01-05 11:28:23.444828: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Unimplemented: ptxas ptxas too old. Falling back to the driver to compile.
Relying on driver to perform ptx compilation. 
Modify $PATH to customize ptxas location.
This message will be only logged once.
hifigan_conf loaded

/usr/local/TensorFlowTTS/examples/hifigan/exp/train.siwis.phoneme.hifigan.v1/checkpoints/generator- + 3520000.h5

Model: "tf_hifi_gan_generator"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sequential (Sequential)      (None, None, 1)           13926017  
=================================================================
Total params: 13,926,017
Trainable params: 13,926,017
Non-trainable params: 0
_________________________________________________________________
2022-01-05 11:28:32.305403: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14474280960 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
Model: "tacotron2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
encoder (TFTacotronEncoder)  multiple                  8218112   
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple                  18246402  
_________________________________________________________________
post_net (TFTacotronPostnet) multiple                  5460480   
_________________________________________________________________
residual_projection (Dense)  multiple                  41040     
=================================================================
Total params: 31,966,034
Trainable params: 31,955,794
Non-trainable params: 10,240
_________________________________________________________________
[131, 139, 106, 103, 25, 139, 129, 10, 96, 145, 10, 106, 139, 107, 10, 89, 114, 25, 75, 10, 106, 139, 10, 97, 145, 19, 10, 80, 25, 143, 144, 6, 10, 143, 144, 99, 10, 75, 10, 97, 139, 106, 25, 84, 10, 107, 108, 129, 99, 25, 84, 10, 97, 145, 19, 10, 113, 100, 96, 100, 83, 25, 141, 129, 10, 103, 137, 144, 83, 25, 137, 144, 10, 107, 108, 107, 10, 97, 84, 19, 10, 113, 75, 96, 25, 137, 144, 106, 10, 107, 116, 10, 75, 118, 10, 139, 144, 107, 84, 129, 25, 139, 10, 75, 10, 103, 75, 129, 97, 25, 84, 106, 92, 99, 25, 143, 144, 147]

abaddon-moriarty commented 2 years ago

Quick update, We managed to correct the ptxas issue. There is still the main issue of gibberish output from the Error in PredictCost() ..

BojanSof commented 2 years ago

@abaddon-moriarty have you resolved the problem of gibberish output when doing inference?

TensorSpeech / TensorFlowTTS

inference issue HiFi-GAN -gibberish output #728

Layer (type) Output Shape Param #

sequential (Sequential) (None, None, 1) 13926017

Layer (type) Output Shape Param #

residual_projection (Dense) multiple 41040