Unsupported SM: 0x601 failure of TensorRT 10.0.1 on 1080ti

ZanderFoster commented 4 months ago

Description

UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ..\aten\src\ATen\native\cudnn\Conv_v8.cpp:919.) return F.conv2d(input, weight, bias, self.stride, Model summary (fused): 168 layers, 3006233 parameters, 0 gradients, 8.1 GFLOPs

PyTorch: starting from 'Models\Realtime\v8n_v2.pt' with input shape (1, 3, 416, 416) BCHW and output shape(s) (1, 7, 3549) (5.9 MB)

ONNX: starting export with onnx 1.16.0 opset 17... ONNX: simplifying with onnxsim 0.4.36... ONNX: export success ✅ 1.1s, saved as 'Models\Realtime\v8n_v2.onnx' (11.6 MB)

TensorRT: starting export with TensorRT 10.0.1... [04/25/2024-15:52:53] [TRT] [I] [MemUsageChange] Init CUDA: CPU +3, GPU +0, now: CPU 12063, GPU 1801 (MiB) [04/25/2024-15:52:54] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +214, GPU +0, now: CPU 12464, GPU 1801 (MiB) [04/25/2024-15:52:54] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [04/25/2024-15:52:54] [TRT] [I] ---------------------------------------------------------------- [04/25/2024-15:52:54] [TRT] [I] Input filename: Models\Realtime\v8n_v2.onnx [04/25/2024-15:52:54] [TRT] [I] ONNX IR version: 0.0.8 [04/25/2024-15:52:54] [TRT] [I] Opset version: 17 [04/25/2024-15:52:54] [TRT] [I] Producer name: pytorch [04/25/2024-15:52:54] [TRT] [I] Producer version: 2.3.0 [04/25/2024-15:52:54] [TRT] [I] Domain: [04/25/2024-15:52:54] [TRT] [I] Model version: 0 [04/25/2024-15:52:54] [TRT] [I] Doc string: [04/25/2024-15:52:54] [TRT] [I] ---------------------------------------------------------------- TensorRT: input "images" with shape(1, 3, 416, 416) DataType.FLOAT TensorRT: output "output0" with shape(1, 7, 3549) DataType.FLOAT TensorRT: building FP32 engine as Models\Realtime\v8n_v2.engine [04/25/2024-15:52:54] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [04/25/2024-15:52:54] [TRT] [I] BuilderFlag::kTF32 is set but hardware does not support TF32. Disabling TF32. [04/25/2024-15:52:54] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored. [04/25/2024-15:52:54] [TRT] [E] 1: Unsupported SM: 0x601 [04/25/2024-15:52:54] [TRT] [E] 1: [caskUtils.cpp::nvinfer1::rt::task::trtSmToCask::193] Error Code 1: Internal Error (Unsupported SM: 0x601)

line 722, in export_engine with build(network, config) as engine, open(f, "wb") as t: TypeError: 'NoneType' object does not support the context manager protocol

TensorRT Version: 10.0.1

NVIDIA GPU: 1080ti

NVIDIA Driver Version: whatever installs with cuda

CUDA Version: 11.8

CUDNN Version: 8.5.0

Operating System: windows

Python Version (if applicable): 3.11

lix19937 commented 4 months ago

[04/25/2024-15:52:54] [TRT] [E] 1: Unsupported SM: 0x601 [04/25/2024-15:52:54] [TRT] [E] 1: [caskUtils.cpp::nvinfer1::rt::task::trtSmToCask::193] Error Code 1: Internal Error (Unsupported SM: 0x601)

'Unsupported SM' means that TensorRT 10.0.1 doesn't support GTX 1080TI's SM 6.1 (Pascal arch), you may downgrade TensorRT version to 9.1.0 or 8.5

There was an up to 28% performance regression compared to TensorRT 8.5 on Transformer networks in FP16 precision on NVIDIA Volta GPUs, and up to 85% performance regression on NVIDIA Pascal GPUs. Disabling the kDISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805 preview flag was a workaround. This issue has been fixed.

ref
https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html#rel-8-6-1

zerollzeng commented 4 months ago

Checking internally.

lschaupp commented 2 months ago

any update on this?

NVIDIA / TensorRT

Unsupported SM: 0x601 failure of TensorRT 10.0.1 on 1080ti #3826

Description