ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 209 forks source link

TKDNN is slower than tensorrt #225

Closed bobbilichandu closed 2 years ago

bobbilichandu commented 3 years ago

Using tensorrt version 7.1.3.4, with batch size 8, yolov4csp takes upto 220ms for 1280x1280 images on a Tesla T4 GPU.

But with tkdnn using same version (7.1.3.4) batchsize 8, image size 1280x1280, it is taking upto 450ms on a Tesla T4 GPU TKDNN occupies 6GB on the GPU whereas naive TRT uses nearly 3GB Any reason why is this the case? Or am I doing something wrong?

ceccocats commented 3 years ago

Hi, can you elaborate? What do you mean for naive TRT? How I can reproduce your issue?

mive93 commented 2 years ago

Closing for inactivity. Feel free to reopen.