ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 208 forks source link

3080Ti could not build cuda engine #264

Closed lswgh closed 2 years ago

lswgh commented 2 years ago

Hi,I build the demo with cuda10.2, cudnn7.6.5 and TensorRT-7.0.0.11 in nvidia 3080Ti,when i convert the yolov3 weights to rt , the error appear as follow: budl so i convert the weights in my 1080Ti ,it work well ,but when i copy the rt weight to 3080Ti, and run ./demo yolo3_fp16.rt ../demo/yolo_test.mp4 y
run

perseusdg commented 2 years ago

I doubt a 3080ti supports cuda 10.2,you might want to try cuda 11.1 + https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

lswgh commented 2 years ago

I doubt a 3080ti supports cuda 10.2,you might want to try cuda 11.1 + https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

thanks for you concern , i try to build the demo with tensort7.2 cuda11.2 cudnn8.1 and tensort8.0 cuda11.2 cudnn8.1 ,but failed ,

tensort8-cuda11

I doubt that TKDNN is not support cuda11.1 + .

perseusdg commented 2 years ago

tkdnn doesnt support tensorrt 8+,i have used tkDNN tensorrt7.2+cuda11.1+ cudnn8.1.1 (if you want to use tensorrt 7.2 with cuda 11.2 you still need cuda 11.1 because nvrtc from cuda 11.1 is still a reuqirement of tensorrt 7.2) https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-7.html#rel_7-2-3

lswgh commented 2 years ago

.2+ Thanks , when i build with tensorrt7.2 cuda11.1 cudnn8.1 , it works , but compare to the 1080Ti(900M for FP32) , the memory run on 3080Ti used more (about 2.9G for FP32 and 2.5G for FP16),