NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT
Apache License 2.0
5.83k stars 891 forks source link

CUDA error: an illegal memory access was encountered #532

Closed ywfwyht closed 1 year ago

ywfwyht commented 1 year ago

Hi.

An error will occur when I run the 'infer_visiontransformer_plugin.py' file. The detailed error information is as follows.

 File "utils/infer_visiontransformer_plugin.py", line 154, in run_trt_plugin
    torch.cuda.synchronize()
  File "/opt/conda/lib/python3.8/site-packages/torch/cuda/__init__.py", line 496, in synchronize
    return torch._C._cuda_synchronize()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

[03/31/2023-01:15:50] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[03/31/2023-01:15:50] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[03/31/2023-01:15:50] [TRT] [E] 1: [cudaResources.cpp::~ScopedCudaEvent::24] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
Error on file ViTPlugin.cpp line 159: CUDNN_STATUS_INTERNAL_ERROR

OS: Ubuntu 18.04 GPU: 3080ti Driver Version: 515 Python Version: 3.8 cuda Version: 11.6 cudnn Version: 8.4.1 pytorch Version: 1.12 tensorrt Version: 8.4.3.1

byshiue commented 1 year ago

Please provide your environment and end to end steps to reproduce your issue.

ywfwyht commented 1 year ago

Now, CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED

Building TRT engine....
[04/04/2023-07:23:56] [TRT] [V] Applying generic optimizations to the graph for inference.
[04/04/2023-07:23:56] [TRT] [V] Original: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After dead-layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After Myelin optimization: 1 layers
[04/04/2023-07:23:56] [TRT] [V] Applying ScaleNodes fusions.
[04/04/2023-07:23:56] [TRT] [V] After scale fusion: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After dupe layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After final dead-layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After tensor merging: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After vertical fusions: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After dupe layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After final dead-layer removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After tensor merging: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After slice removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] After concat removal: 1 layers
[04/04/2023-07:23:56] [TRT] [V] Trying to split Reshape and strided tensor
[04/04/2023-07:23:56] [TRT] [V] Graph construction and optimization completed in 0.114169 seconds.
[04/04/2023-07:23:56] [TRT] [V] Using cublasLt as a tactic source
[04/04/2023-07:23:56] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +68, GPU +8, now: CPU 4739, GPU 3703 (MiB)
[04/04/2023-07:23:56] [TRT] [V] Using cuDNN as a tactic source
[04/04/2023-07:23:56] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 4739, GPU 3713 (MiB)
[04/04/2023-07:23:56] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/04/2023-07:23:56] [TRT] [V] Constructing optimization profile number 0 [1/1].
[04/04/2023-07:23:56] [TRT] [V] Reserving memory for host IO tensors. Host: 0 bytes
[04/04/2023-07:23:56] [TRT] [V] =============== Computing reformatting costs
[04/04/2023-07:23:56] [TRT] [V] =============== Computing reformatting costs
[04/04/2023-07:23:56] [TRT] [V] =============== Computing costs for 
[04/04/2023-07:23:56] [TRT] [V] *************** Autotuning format combination: Float(442368,147456,384,1) -> Float(442368,768,1) ***************
[04/04/2023-07:23:56] [TRT] [V] Formats and tactics selection completed in 0.0815964 seconds.
[04/04/2023-07:23:56] [TRT] [V] After reformat layers: 1 layers
[04/04/2023-07:23:56] [TRT] [V] Pre-optimized block assignment.
[04/04/2023-07:23:56] [TRT] [V] Block size 8589934592
[04/04/2023-07:23:56] [TRT] [V] Total Activation Memory: 8589934592
[04/04/2023-07:23:56] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[04/04/2023-07:23:56] [TRT] [V] Layer: (Unnamed Layer* 0) [PluginV2DynamicExt] Host Persistent: 112 Device Persistent: 0 Scratch Memory: 0
[04/04/2023-07:23:56] [TRT] [I] Total Host Persistent Memory: 112
[04/04/2023-07:23:56] [TRT] [I] Total Device Persistent Memory: 0
[04/04/2023-07:23:56] [TRT] [I] Total Scratch Memory: 0
[04/04/2023-07:23:56] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 0 MiB
[04/04/2023-07:23:56] [TRT] [V] Optimized block assignment.
[04/04/2023-07:23:56] [TRT] [I] Total Activation Memory: 0
[04/04/2023-07:23:57] [TRT] [V] Disabling unused tactic source: EDGE_MASK_CONVOLUTIONS
[04/04/2023-07:23:57] [TRT] [V] Using cublasLt as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 4827, GPU 4229 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Using cuDNN as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 4827, GPU 4237 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Engine generation completed in 1.53489 seconds.
[04/04/2023-07:23:57] [TRT] [V] Engine Layer Information:
Layer(PluginV2): (Unnamed Layer* 0) [PluginV2DynamicExt], Tactic: 0x0000000000000000, input_img[Float(-2,3,384,384)] -> vision_transformer_output[Float(-2,576,768)]
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Using cublasLt as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 4826, GPU 3955 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Using cuDNN as a tactic source
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 4826, GPU 3963 (MiB)
[04/04/2023-07:23:57] [TRT] [V] Total per-runner device persistent memory is 0
[04/04/2023-07:23:57] [TRT] [V] Total per-runner host persistent memory is 112
[04/04/2023-07:23:57] [TRT] [V] Allocated activation device memory of size 0
[04/04/2023-07:23:57] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
plugin output_shape:  (2, 576, 768)
18517345600
19696338240x306000000
terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: CUBLAS_STATUS_EXECUTION_FAILED /home/work_dir/K-Lane/cpp_lane_det/transformerplugin/src/transformer/utils/cublasMMWrapper.cc:324 

Aborted (core dumped)
byshiue commented 1 year ago

Please provide end to end steps to reproduce your issue.

ywfwyht commented 1 year ago

Please provide end to end steps to reproduce your issue.

Today, I fixed it.

ydldarling commented 11 months ago

Please provide end to end steps to reproduce your issue.

Today, I fixed it.

how do you fix it?could you please tell me?