Closed ywfwyht closed 1 year ago
Please provide your environment and end to end steps to reproduce your issue.
Now, CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED
Building TRT engine....
[04/04/2023-07:23:56] [TRT] [V] Applying generic optimizations to the graph for inference.
[04/04/2023-07:23:56] [TRT] [V] Original: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After dead-layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After Myelin optimization: 1 layers
[04/04/2023-07:23:56] [TRT] [V] Applying ScaleNodes fusions.
[04/04/2023-07:23:56] [TRT] [V] After scale fusion: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After dupe layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After final dead-layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After tensor merging: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After vertical fusions: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After dupe layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After final dead-layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After tensor merging: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After slice removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After concat removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] Trying to split Reshape and strided tensor
[04/04/2023-07:23:56] [TRT] [V] Graph construction and optimization completed in 0.114169 seconds.
[04/04/2023-07:23:56] [TRT] [V] Using cublasLt as a tactic source
[04/04/2023-07:23:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +68, GPU +8, now: CPU 4739, GPU 3703 (MiB)
[04/04/2023-07:23:56] [TRT] [V] Using cuDNN as a tactic source
[04/04/2023-07:23:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 4739, GPU 3713 (MiB)
[04/04/2023-07:23:56] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/04/2023-07:23:56] [TRT] [V] Constructing optimization profile number 0 [1/1].
[04/04/2023-07:23:56] [TRT] [V] Reserving memory for host IO tensors. Host: 0 bytes
[04/04/2023-07:23:56] [TRT] [V] =============== Computing reformatting costs
[04/04/2023-07:23:56] [TRT] [V] =============== Computing reformatting costs
[04/04/2023-07:23:56] [TRT] [V] =============== Computing costs for
[04/04/2023-07:23:56] [TRT] [V] *************** Autotuning format combination: Float(442368,147456,384,1) -> Float(442368,768,1) ***************
[04/04/2023-07:23:56] [TRT] [V] Formats and tactics selection completed in 0.0815964 seconds.
[04/04/2023-07:23:56] [TRT] [V] After reformat layers: 1 layers
[04/04/2023-07:23:56] [TRT] [V] Pre-optimized block assignment.
[04/04/2023-07:23:56] [TRT] [V] Block size 8589934592
[04/04/2023-07:23:56] [TRT] [V] Total Activation Memory: 8589934592
[04/04/2023-07:23:56] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[04/04/2023-07:23:56] [TRT] [V] Layer: (Unnamed Layer* 0) [PluginV2DynamicExt] Host Persistent: 112 Device Persistent: 0 Scratch Memory: 0
[04/04/2023-07:23:56] [TRT] [I] Total Host Persistent Memory: 112
[04/04/2023-07:23:56] [TRT] [I] Total Device Persistent Memory: 0
[04/04/2023-07:23:56] [TRT] [I] Total Scratch Memory: 0
[04/04/2023-07:23:56] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 0 MiB
[04/04/2023-07:23:56] [TRT] [V] Optimized block assignment.
[04/04/2023-07:23:56] [TRT] [I] Total Activation Memory: 0
[04/04/2023-07:23:57] [TRT] [V] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[04/04/2023-07:23:57] [TRT] [V] Using cublasLt as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 4827, GPU 4229 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Using cuDNN as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 4827, GPU 4237 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Engine generation completed in 1.53489 seconds.
[04/04/2023-07:23:57] [TRT] [V] Engine Layer Information:
Layer(PluginV2): (Unnamed Layer* 0) [PluginV2DynamicExt], Tactic: 0x0000000000000000, input_img[Float(-2,3,384,384)] -> vision_transformer_output[Float(-2,576,768)]
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Using cublasLt as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 4826, GPU 3955 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Using cuDNN as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 4826, GPU 3963 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Total per-runner device persistent memory is 0
[04/04/2023-07:23:57] [TRT] [V] Total per-runner host persistent memory is 112
[04/04/2023-07:23:57] [TRT] [V] Allocated activation device memory of size 0
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
plugin output_shape: (2, 576, 768)
18517345600
19696338240x306000000
terminate called after throwing an instance of 'std::runtime_error'
what(): [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED /home/work_dir/K-Lane/cpp_lane_det/transformerplugin/src/transformer/utils/cublasMMWrapper.cc:324
Aborted (core dumped)
Please provide end to end steps to reproduce your issue.
Please provide end to end steps to reproduce your issue.
Today, I fixed it.
Please provide end to end steps to reproduce your issue.
Today, I fixed it.
how do you fix it?could you please tell me?
Hi.
An error will occur when I run the 'infer_visiontransformer_plugin.py' file. The detailed error information is as follows.
OS: Ubuntu 18.04 GPU: 3080ti Driver Version: 515 Python Version: 3.8 cuda Version: 11.6 cudnn Version: 8.4.1 pytorch Version: 1.12 tensorrt Version: 8.4.3.1