Closed mdztravelling closed 1 year ago
can you share a reproduce for this? would be good if you can share the onnx model.
can you share a reproduce for this? would be good if you can share the onnx model.
Running the demo model as follows:
https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT
@zerollzeng
@nvpohanh Is it a known issue? If not I can do a further reproduce on T4. Thanks!
@nvpohanh Is it a known issue? If not I can do a further reproduce on T4. Thanks!
This issue can be reproduced on T4 and A10 using trt 8.5.x.
Could you upgrade to CUDA 11.4 or later?
I remember that CUDA 11.3 had some NVRTC issues that might have caused this.
Could you upgrade to CUDA 11.4 or later?
I remember that CUDA 11.3 had some NVRTC issues that might have caused this.
I upgrade CUDA's version to 11.7, but this error still occurs. @nvpohanh
@ttyio Have we seen any demoBERT failures like this on T4?
@mdztravelling Could you also make sure that your system has NVRTC installed? It should have been installed as part of the CUDA toolkit. Also, could you share the full log? Thanks
@ttyio Have we seen any demoBERT failures like this on T4?
@mdztravelling Could you also make sure that your system has NVRTC installed? It should have been installed as part of the CUDA toolkit. Also, could you share the full log? Thanks
The full log as follow https://github.com/mdztravelling/albert_trt_plugin/blob/master/log.txt
on A10.
NVRTC has installed in cuda/lib64
/usr/local/cuda-11.7/lib64/libnvrtc-builtins.so
/usr/local/cuda-11.7/lib64/libnvrtc-builtins.so.11.7
/usr/local/cuda-11.7/lib64/libnvrtc-builtins.so.11.7.99
/usr/local/cuda-11.7/lib64/libnvrtc-builtins_static.a
/usr/local/cuda-11.7/lib64/libnvrtc.so
/usr/local/cuda-11.7/lib64/libnvrtc.so.11.2
/usr/local/cuda-11.7/lib64/libnvrtc.so.11.7.99
/usr/local/cuda-11.7/lib64/libnvrtc_static.a
I have not seen this error before. possible enviroment issue. @mdztravelling , could you try the docker env follow https://github.com/NVIDIA/TensorRT#setting-up-the-build-environment? thanks!
It looks like TRT cannot find any FP16 conv+gelu kernels:
[02/20/2023-20:39:39] [TRT] [V] *************** Autotuning format combination: Half(384,96,1:8,96,96) -> Half(1536,384,1:8,384,384) ***************
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CudaDepthwiseConvolution)
[02/20/2023-20:39:39] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CublasConvolution)
[02/20/2023-20:39:39] [TRT] [V] CublasConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskGemmConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskGemmConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskFlattenConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskFlattenConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
And this is weird and unexpected. @mdztravelling Could you share your builder.py ...
full command so that @zerollzeng can try to repro this? Thanks
It looks like TRT cannot find any FP16 conv+gelu kernels:
[02/20/2023-20:39:39] [TRT] [V] *************** Autotuning format combination: Half(384,96,1:8,96,96) -> Half(1536,384,1:8,384,384) *************** [02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CudaDepthwiseConvolution) [02/20/2023-20:39:39] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping [02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CublasConvolution) [02/20/2023-20:39:39] [TRT] [V] CublasConvolution has no valid tactics for this config, skipping [02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskGemmConvolution) [02/20/2023-20:39:39] [TRT] [V] CaskGemmConvolution has no valid tactics for this config, skipping [02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskFlattenConvolution) [02/20/2023-20:39:39] [TRT] [V] CaskFlattenConvolution has no valid tactics for this config, skipping [02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskConvolution) [02/20/2023-20:39:39] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
And this is weird and unexpected. @mdztravelling Could you share your
builder.py ...
full command so that @zerollzeng can try to repro this? Thanks
Thanks for reminding me to check builder.py
, I found this difference
tactic_source = 1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT)
and tactic_source = builder_config.get_tactic_sources() & ~(1 << int(trt.TacticSource.CUDNN))
.
There is no error in replacing
tactic_source = builder_config.get_tactic_sources() & ~(1 << int(trt.TacticSource.CUDNN))
with
tactic_source = 1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT)
In version 8.5.x.
Full script of builder.py
as follows
...
# speed up the engine build for trt major version >= 8
# 1. disable cudnn tactic
# 2. load global timing cache
if trt_version[0] >= 8:
#tactic_source = 1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT) # trt 8.0 or earlier
tactic_source = builder_config.get_tactic_sources() & ~(1 << int(trt.TacticSource.CUDNN)) # use this command in trt 8.5.x.
builder_config.set_tactic_sources(tactic_source)
...
This error has been solved, thanks for @nvpohanh @zerollzeng @ttyio .
Cool that the issue is solved, closing
Description
I encountered the following error when building the bert-base model with trt 8.5.3 use
builder.py
on T4 GPU. Is this a bug for 8.5.3 on T4 ?The errors in FP16:
The errors in INT8:
Environment
TensorRT Version: 8.5.3 NVIDIA GPU: T4 NVIDIA Driver Version: 465.19.01 CUDA Version: 11.3 CUDNN Version: 8.5 Operating System: CentOS 8.2 Python Version (if applicable): 3.9 Tensorflow Version (if applicable): 2.6 PyTorch Version (if applicable): 1.10 Baremetal or Container (if so, version):
Relevant Files
Steps To Reproduce