NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.84k stars 2.14k forks source link

Error Code 10: Internal Error (Could not find any implementation for node (Unnamed Layer* 8) [Convolution] error occurred when building the bert-base model using trt8.5.3 #2681

Closed mdztravelling closed 1 year ago

mdztravelling commented 1 year ago

Description

I encountered the following error when building the bert-base model with trt 8.5.3 use builder.py on T4 GPU. Is this a bug for 8.5.3 on T4 ?

The errors in FP16:

[02/14/2023-20:48:57] [TRT] [V] CaskFlattenConvolution has no valid tactics for this config, skipping
[02/14/2023-20:48:57] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskConvolution)
[02/14/2023-20:48:57] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[02/14/2023-20:48:57] [TRT] [V] Deleting timing cache: 266 entries, served 2730 hits since creation.
[02/14/2023-20:48:57] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 
[02/14/2023-20:48:57] [TRT] [I] build engine in 16.323 Sec

The errors in INT8:

[02/14/2023-20:37:04] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Convolution] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskConvolution)
[02/14/2023-20:37:04] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping
[02/14/2023-20:37:04] [TRT] [V] Deleting timing cache: 100 entries, served 1976 hits since creation.
[02/14/2023-20:37:04] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node (Unnamed Layer* 8) [Convolution] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) 
[02/14/2023-20:37:04] [TRT] [I] build engine in 16.901 Sec

Environment

TensorRT Version: 8.5.3 NVIDIA GPU: T4 NVIDIA Driver Version: 465.19.01 CUDA Version: 11.3 CUDNN Version: 8.5 Operating System: CentOS 8.2 Python Version (if applicable): 3.9 Tensorflow Version (if applicable): 2.6 PyTorch Version (if applicable): 1.10 Baremetal or Container (if so, version):

Relevant Files

Steps To Reproduce

zerollzeng commented 1 year ago

can you share a reproduce for this? would be good if you can share the onnx model.

mdztravelling commented 1 year ago

can you share a reproduce for this? would be good if you can share the onnx model.

Running the demo model as follows: https://github.com/NVIDIA/TensorRT/tree/main/demo/BERT @zerollzeng

zerollzeng commented 1 year ago

@nvpohanh Is it a known issue? If not I can do a further reproduce on T4. Thanks!

mdztravelling commented 1 year ago

@nvpohanh Is it a known issue? If not I can do a further reproduce on T4. Thanks!

This issue can be reproduced on T4 and A10 using trt 8.5.x.

nvpohanh commented 1 year ago

Could you upgrade to CUDA 11.4 or later?

I remember that CUDA 11.3 had some NVRTC issues that might have caused this.

mdztravelling commented 1 year ago

Could you upgrade to CUDA 11.4 or later?

I remember that CUDA 11.3 had some NVRTC issues that might have caused this.

I upgrade CUDA's version to 11.7, but this error still occurs. @nvpohanh

nvpohanh commented 1 year ago

@ttyio Have we seen any demoBERT failures like this on T4?

@mdztravelling Could you also make sure that your system has NVRTC installed? It should have been installed as part of the CUDA toolkit. Also, could you share the full log? Thanks

mdztravelling commented 1 year ago

@ttyio Have we seen any demoBERT failures like this on T4?

@mdztravelling Could you also make sure that your system has NVRTC installed? It should have been installed as part of the CUDA toolkit. Also, could you share the full log? Thanks

The full log as follow https://github.com/mdztravelling/albert_trt_plugin/blob/master/log.txt on A10. NVRTC has installed in cuda/lib64

/usr/local/cuda-11.7/lib64/libnvrtc-builtins.so
/usr/local/cuda-11.7/lib64/libnvrtc-builtins.so.11.7
/usr/local/cuda-11.7/lib64/libnvrtc-builtins.so.11.7.99
/usr/local/cuda-11.7/lib64/libnvrtc-builtins_static.a
/usr/local/cuda-11.7/lib64/libnvrtc.so
/usr/local/cuda-11.7/lib64/libnvrtc.so.11.2
/usr/local/cuda-11.7/lib64/libnvrtc.so.11.7.99
/usr/local/cuda-11.7/lib64/libnvrtc_static.a
ttyio commented 1 year ago

I have not seen this error before. possible enviroment issue. @mdztravelling , could you try the docker env follow https://github.com/NVIDIA/TensorRT#setting-up-the-build-environment? thanks!

nvpohanh commented 1 year ago

It looks like TRT cannot find any FP16 conv+gelu kernels:

[02/20/2023-20:39:39] [TRT] [V] *************** Autotuning format combination: Half(384,96,1:8,96,96) -> Half(1536,384,1:8,384,384) ***************
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CudaDepthwiseConvolution)
[02/20/2023-20:39:39] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CublasConvolution)
[02/20/2023-20:39:39] [TRT] [V] CublasConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskGemmConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskGemmConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskFlattenConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskFlattenConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping

And this is weird and unexpected. @mdztravelling Could you share your builder.py ... full command so that @zerollzeng can try to repro this? Thanks

mdztravelling commented 1 year ago

It looks like TRT cannot find any FP16 conv+gelu kernels:

[02/20/2023-20:39:39] [TRT] [V] *************** Autotuning format combination: Half(384,96,1:8,96,96) -> Half(1536,384,1:8,384,384) ***************
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CudaDepthwiseConvolution)
[02/20/2023-20:39:39] [TRT] [V] CudaDepthwiseConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CublasConvolution)
[02/20/2023-20:39:39] [TRT] [V] CublasConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskGemmConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskGemmConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskFlattenConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskFlattenConvolution has no valid tactics for this config, skipping
[02/20/2023-20:39:39] [TRT] [V] --------------- Timing Runner: (Unnamed Layer* 8) [Fully Connected] + PWN(PWN(PWN(PWN(PWN(PWN((Unnamed Layer* 9) [Constant], (Unnamed Layer* 14) [ElementWise]), (Unnamed Layer* 10) [Constant] + (Unnamed Layer* 15) [ElementWise]), (Unnamed Layer* 16) [ElementWise]), (Unnamed Layer* 11) [Constant] + (Unnamed Layer* 17) [ElementWise] + (Unnamed Layer* 18) [Activation]), (Unnamed Layer* 12) [Constant] + (Unnamed Layer* 19) [ElementWise] + (Unnamed Layer* 13) [Constant] + (Unnamed Layer* 20) [ElementWise]), (Unnamed Layer* 21) [ElementWise]) (CaskConvolution)
[02/20/2023-20:39:39] [TRT] [V] CaskConvolution has no valid tactics for this config, skipping

And this is weird and unexpected. @mdztravelling Could you share your builder.py ... full command so that @zerollzeng can try to repro this? Thanks

Thanks for reminding me to check builder.py, I found this difference tactic_source = 1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT) and tactic_source = builder_config.get_tactic_sources() & ~(1 << int(trt.TacticSource.CUDNN)).

There is no error in replacing tactic_source = builder_config.get_tactic_sources() & ~(1 << int(trt.TacticSource.CUDNN)) with tactic_source = 1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT) In version 8.5.x.

Full script of builder.py as follows

...
# speed up the engine build for trt major version >= 8
# 1. disable cudnn tactic
# 2. load global timing cache
if trt_version[0] >= 8:
     #tactic_source = 1 << int(trt.TacticSource.CUBLAS) | 1 << int(trt.TacticSource.CUBLAS_LT)   # trt 8.0 or earlier
     tactic_source = builder_config.get_tactic_sources() & ~(1 << int(trt.TacticSource.CUDNN))   # use this command in trt 8.5.x.
     builder_config.set_tactic_sources(tactic_source)
...

This error has been solved, thanks for @nvpohanh @zerollzeng @ttyio .

ttyio commented 1 year ago

Cool that the issue is solved, closing