ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 208 forks source link

Added TensorRT8 support #270

Closed TheExDeus closed 2 years ago

TheExDeus commented 2 years ago

I tested on an Xavier NX running Jetpack4.6 and at least the yolo4 network runs fine. Haven't tested all of them.

In the future I might do some more refactoring and cleanup, but not sure if then I would diverge a lot from what the maintainers are doing here. I think some performance can be gained from a cleanup and dropping some older compatibility (I think maybe TRT6 or even 7 should be the lowest now).

Also this project really needs a style guide :D

LangArthur commented 2 years ago

Hi!

I'm currently using your fork and got a tiny problem with it.

I'm using the static version of the library. When the static lib is linked to an executable (for example, demos or test executable included with tkdnn), I got the following error:

/usr/bin/ld: libkernels.a(kernels_generated_static_init.cu.o): in function `tk::dnn::YoloRTCreator::YoloRTCreator()':
/path/to/file/tkDNN/include/tkDNN/pluginsRT/YoloRT.h:188: undefined reference to `vtable for tk::dnn::YoloRTCreator'

If it is usefull, I'm using gcc 9.3.0

I found a workaround by removing the default specification on constructor from YoloRTCreator and declaring it as an empty constructor in yoloContainer.cpp. Then I also add src/yoloContainer.cpp at the compilation of kernel. Maybe it should be so in the library ? Or a better alternative exist ?

TheExDeus commented 2 years ago

Thanks for the info! I sadly didn't test static lib. I think your change will do fine. In general I don't think having this global object is a good idea and there usually isn't a need to have one. So the fact that specifically yolo layer is implemented like this is already incorrect, but I don't plan to change that, as then I might as well make most of this from scratch.

mive93 commented 2 years ago

TensorRT8 is now supported on tensorrt8 branch. Every model and data type is properly working.

Thank you @TheExDeus for your work, I have tested and considered it. However, I was already working with @perseusdg to support TRT8 and his implementation was more complete, but some of your choices helped us on completing the porting properly. So, thank you very much :)