[ltWrapper.cpp::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. )

When I convert .onnx to .engine, it failed with " [10/28/2022-14:21:04] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [10/28/2022-14:23:32] [E] Error[2]: [ltWrapper.cpp::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. ) [10/28/2022-14:23:32] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. ) "

./trtexec --onnx=1_4_512.onnx --saveEngine=1_4_512.engine --workspace=2048 --fp16 --device=0 &&&& RUNNING TensorRT.trtexec [TensorRT v8203] # ./trtexec --onnx=1_4_512.onnx --saveEngine=1_4_512.engine --workspace=2048 --fp16 --device=0 [10/28/2022-14:14:33] [I] === Model Options === [10/28/2022-14:14:33] [I] Format: ONNX [10/28/2022-14:14:33] [I] Model: 1_4_512.onnx [10/28/2022-14:14:33] [I] Output: [10/28/2022-14:14:33] [I] === Build Options === [10/28/2022-14:14:33] [I] Max batch: explicit batch [10/28/2022-14:14:33] [I] Workspace: 2048 MiB [10/28/2022-14:14:33] [I] minTiming: 1 [10/28/2022-14:14:33] [I] avgTiming: 8 [10/28/2022-14:14:33] [I] Precision: FP32+FP16 [10/28/2022-14:14:33] [I] Calibration: [10/28/2022-14:14:33] [I] Refit: Disabled [10/28/2022-14:14:33] [I] Sparsity: Disabled [10/28/2022-14:14:33] [I] Safe mode: Disabled [10/28/2022-14:14:33] [I] DirectIO mode: Disabled [10/28/2022-14:14:33] [I] Restricted mode: Disabled [10/28/2022-14:14:33] [I] Save engine: 1_4_512.engine [10/28/2022-14:14:33] [I] Load engine: [10/28/2022-14:14:33] [I] Profiling verbosity: 0 [10/28/2022-14:14:33] [I] Tactic sources: Using default tactic sources [10/28/2022-14:14:33] [I] timingCacheMode: local [10/28/2022-14:14:33] [I] timingCacheFile: [10/28/2022-14:14:33] [I] Input(s)s format: fp32:CHW [10/28/2022-14:14:33] [I] Output(s)s format: fp32:CHW [10/28/2022-14:14:33] [I] Input build shapes: model [10/28/2022-14:14:33] [I] Input calibration shapes: model [10/28/2022-14:14:33] [I] === System Options === [10/28/2022-14:14:33] [I] Device: 0 [10/28/2022-14:14:33] [I] DLACore: [10/28/2022-14:14:33] [I] Plugins: [10/28/2022-14:14:33] [I] === Inference Options === [10/28/2022-14:14:33] [I] Batch: Explicit [10/28/2022-14:14:33] [I] Input inference shapes: model [10/28/2022-14:14:33] [I] Iterations: 10 [10/28/2022-14:14:33] [I] Duration: 3s (+ 200ms warm up) [10/28/2022-14:14:33] [I] Sleep time: 0ms [10/28/2022-14:14:33] [I] Idle time: 0ms [10/28/2022-14:14:33] [I] Streams: 1 [10/28/2022-14:14:33] [I] ExposeDMA: Disabled [10/28/2022-14:14:33] [I] Data transfers: Enabled [10/28/2022-14:14:33] [I] Spin-wait: Disabled [10/28/2022-14:14:33] [I] Multithreading: Disabled [10/28/2022-14:14:33] [I] CUDA Graph: Disabled [10/28/2022-14:14:33] [I] Separate profiling: Disabled [10/28/2022-14:14:33] [I] Time Deserialize: Disabled [10/28/2022-14:14:33] [I] Time Refit: Disabled [10/28/2022-14:14:33] [I] Skip inference: Disabled [10/28/2022-14:14:33] [I] Inputs: [10/28/2022-14:14:33] [I] === Reporting Options === [10/28/2022-14:14:33] [I] Verbose: Disabled [10/28/2022-14:14:33] [I] Averages: 10 inferences [10/28/2022-14:14:33] [I] Percentile: 99 [10/28/2022-14:14:33] [I] Dump refittable layers:Disabled [10/28/2022-14:14:33] [I] Dump output: Disabled [10/28/2022-14:14:33] [I] Profile: Disabled [10/28/2022-14:14:33] [I] Export timing to JSON file: [10/28/2022-14:14:33] [I] Export output to JSON file: [10/28/2022-14:14:33] [I] Export profile to JSON file: [10/28/2022-14:14:33] [I] [10/28/2022-14:14:55] [I] === Device Information === [10/28/2022-14:14:55] [I] Selected Device: NVIDIA GeForce RTX 3090 [10/28/2022-14:14:55] [I] Compute Capability: 8.6 [10/28/2022-14:14:55] [I] SMs: 82 [10/28/2022-14:14:55] [I] Compute Clock Rate: 1.695 GHz [10/28/2022-14:14:55] [I] Device Global Memory: 24268 MiB [10/28/2022-14:14:55] [I] Shared Memory per SM: 100 KiB [10/28/2022-14:14:55] [I] Memory Bus Width: 384 bits (ECC disabled) [10/28/2022-14:14:55] [I] Memory Clock Rate: 9.751 GHz [10/28/2022-14:14:55] [I] [10/28/2022-14:14:55] [I] TensorRT version: 8.2.3 [10/28/2022-14:17:01] [I] [TRT] [MemUsageChange] Init CUDA: CPU +176, GPU +0, now: CPU 180, GPU 379 (MiB) [10/28/2022-14:17:03] [I] [TRT] [MemUsageSnapshot] Begin constructing builder kernel library: CPU 180 MiB, GPU 379 MiB [10/28/2022-14:17:03] [I] [TRT] [MemUsageSnapshot] End constructing builder kernel library: CPU 201 MiB, GPU 379 MiB [10/28/2022-14:17:03] [I] Start parsing network model [10/28/2022-14:17:03] [I] [TRT] ---------------------------------------------------------------- [10/28/2022-14:17:03] [I] [TRT] Input filename: 1_4_512.onnx [10/28/2022-14:17:03] [I] [TRT] ONNX IR version: 0.0.6 [10/28/2022-14:17:03] [I] [TRT] Opset version: 11 [10/28/2022-14:17:03] [I] [TRT] Producer name: pytorch [10/28/2022-14:17:03] [I] [TRT] Producer version: 1.8 [10/28/2022-14:17:03] [I] [TRT] Domain: [10/28/2022-14:17:03] [I] [TRT] Model version: 0 [10/28/2022-14:17:03] [I] [TRT] Doc string: [10/28/2022-14:17:03] [I] [TRT] ---------------------------------------------------------------- [10/28/2022-14:17:03] [W] [TRT] onnx2trt_utils.cpp:366: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. [10/28/2022-14:17:03] [W] [TRT] Tensor DataType is determined at build time for tensors not marked as input or output. [10/28/2022-14:17:03] [I] Finish parsing network model [10/28/2022-14:19:41] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 10.2.3 but loaded cuBLAS/cuBLAS LT 10.2.2 [10/28/2022-14:19:41] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +102, GPU +52, now: CPU 326, GPU 431 (MiB) [10/28/2022-14:19:45] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +651, GPU +266, now: CPU 977, GPU 697 (MiB) [10/28/2022-14:19:45] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.1.1 [10/28/2022-14:19:45] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [10/28/2022-14:21:04] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [10/28/2022-14:23:32] [E] Error[2]: [ltWrapper.cpp::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. ) [10/28/2022-14:23:32] [E] Error[2]: [builder.cpp::buildSerializedNetwork::609] Error Code 2: Internal Error (Assertion enginePtr != nullptr failed. ) [10/28/2022-14:23:32] [E] Engine could not be created from network [10/28/2022-14:23:32] [E] Building engine failed [10/28/2022-14:23:32] [E] Failed to create engine from model. [10/28/2022-14:23:32] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8203] # ./trtexec --onnx=1_4_512.onnx --saveEngine=1_4_512.engine --workspace=2048 --fp16 --device=0

NVIDIA / TensorRT

[ltWrapper.cpp::setupHeuristic::327] Error Code 2: Internal Error (Assertion cublasStatus == CUBLAS_STATUS_SUCCESS failed. ) #2436