Closed abaddon-moriarty closed 2 years ago
@abaddon-moriarty Looks like a CUDA/CuDNN issue. Versions? Run nvidia-smi
and show the output, and follow this tutorial. that ptxas error in general may be problematic
Hi @ZDisket, thank you for the reply.
This is the output I get with nvidia-smi
command:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1B.0 Off | 0 | | N/A 73C P0 76W / 70W | 8044MiB / 15109MiB | 90% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla T4 On | 00000000:00:1C.0 Off | 0 | | N/A 28C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla T4 On | 00000000:00:1D.0 Off | 0 | | N/A 28C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 27C P8 9W / 70W | 0MiB / 15109MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 34257 C python 8041MiB | +-----------------------------------------------------------------------------+
I have tried following the tutorial you linked @ZDisket , I get stuck at this command cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
the file exists but does not have MAJOR and MINOR details.
but I did run these commands, cuda and cudnn seem to be installed correctly
check libcuda
libcudart.so.11.0 -> libcudart.so.11.2.72
libcudart.so.9.1 -> libcudart.so.9.1.85
libcuda.so.1 -> libcuda.so.460.27.04
libcuda is installed
check libcudart
libcudart.so.11.0 -> libcudart.so.11.2.72
libcudart.so.9.1 -> libcudart.so.9.1.85
libcudart is installed
check libcudnn
libcudnn_ops_infer.so.8 -> libcudnn_ops_infer.so.8.1.1
libcudnn_ops_train.so.8 -> libcudnn_ops_train.so.8.1.1
libcudnn.so.8 -> libcudnn.so.8.1.1
libcudnn_cnn_infer.so.8 -> libcudnn_cnn_infer.so.8.1.1
libcudnn_cnn_train.so.8 -> libcudnn_cnn_train.so.8.1.1
libcudnn_adv_infer.so.8 -> libcudnn_adv_infer.so.8.1.1
libcudnn_adv_train.so.8 -> libcudnn_adv_train.so.8.1.1
libcudnn is installed
@abaddon-moriarty The issue might be caused by MultiGPU.
Whatever script you use to run inference you might want to run by prepending CUDA_VISIBLE_DEVICES=0
before the command
@ZDisket I ran synthesis by adding this in the command. The same error arises, when using only one GPU.
2.6.0
2022-01-05 11:28:17.593128: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-01-05 11:28:18.182107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13803 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:00:1b.0, compute capability: 7.5
2022-01-05 11:28:21.817079: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2022-01-05 11:28:22.905813: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8101
2022-01-05 11:28:23.443034: W tensorflow/stream_executor/gpu/asm_compiler.cc:113] *** WARNING *** You are using ptxas 9.1.108, which is older than 9.2.88. ptxas 9.x before 9.2.88 is known to miscompile XLA code, leading to incorrect results or invalid-address errors.
You do not need to update to CUDA 9.2.88; cherry-picking the ptxas binary is sufficient.
2022-01-05 11:28:23.444757: W tensorflow/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 7.5
2022-01-05 11:28:23.444774: W tensorflow/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
2022-01-05 11:28:23.444828: W tensorflow/stream_executor/gpu/redzone_allocator.cc:314] Unimplemented: ptxas ptxas too old. Falling back to the driver to compile.
Relying on driver to perform ptx compilation.
Modify $PATH to customize ptxas location.
This message will be only logged once.
hifigan_conf loaded
/usr/local/TensorFlowTTS/examples/hifigan/exp/train.siwis.phoneme.hifigan.v1/checkpoints/generator- + 3520000.h5
Model: "tf_hifi_gan_generator"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential (Sequential) (None, None, 1) 13926017
=================================================================
Total params: 13,926,017
Trainable params: 13,926,017
Non-trainable params: 0
_________________________________________________________________
2022-01-05 11:28:32.305403: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14474280960 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
Model: "tacotron2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
encoder (TFTacotronEncoder) multiple 8218112
_________________________________________________________________
decoder_cell (TFTacotronDeco multiple 18246402
_________________________________________________________________
post_net (TFTacotronPostnet) multiple 5460480
_________________________________________________________________
residual_projection (Dense) multiple 41040
=================================================================
Total params: 31,966,034
Trainable params: 31,955,794
Non-trainable params: 10,240
_________________________________________________________________
[131, 139, 106, 103, 25, 139, 129, 10, 96, 145, 10, 106, 139, 107, 10, 89, 114, 25, 75, 10, 106, 139, 10, 97, 145, 19, 10, 80, 25, 143, 144, 6, 10, 143, 144, 99, 10, 75, 10, 97, 139, 106, 25, 84, 10, 107, 108, 129, 99, 25, 84, 10, 97, 145, 19, 10, 113, 100, 96, 100, 83, 25, 141, 129, 10, 103, 137, 144, 83, 25, 137, 144, 10, 107, 108, 107, 10, 97, 84, 19, 10, 113, 75, 96, 25, 137, 144, 106, 10, 107, 116, 10, 75, 118, 10, 139, 144, 107, 84, 129, 25, 139, 10, 75, 10, 103, 75, 129, 97, 25, 84, 106, 92, 99, 25, 143, 144, 147]
Quick update, We managed to correct the ptxas issue. There is still the main issue of gibberish output from the Error in PredictCost() ..
@abaddon-moriarty have you resolved the problem of gibberish output when doing inference?
Hi,
I have been trying to hear my french model, but I still cannot get a good output when running inference with Tacotron 2 and HiFi-GAN, I get gibberish instead.
When I run inference, the following line arises, I think it is refering to what is causing the problem, but I don't know what to do with it. full log at the end
2022-01-03 09:22:50.374907: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:689] Error in PredictCost() for the op: op: "Softmax" attr { key: "T" value { type: DT_FLOAT } } inputs { dtype: DT_FLOAT shape { unknown_rank: true } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11020" } environment { key: "cudnn" value: "8100" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 6042681344 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { unknown_rank: true } }
I saw that the following issue was similar but no ideas were given on how to correct it or what was the problem. Does anybody have a a clearer view on this?
Thank you