ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 209 forks source link

The current version has a huge performance gap compared with the previous version #226

Closed dongxuanlb closed 3 years ago

dongxuanlb commented 3 years ago

Xavier, yolov4, fp16 At version: adac8576b0faf515ad3f459b1f50fd16cef6d64d with 33fps At version: a638592fc74668471e87ac930b4695ce99dc7d43 with only 10fps

ceccocats commented 3 years ago

thank you, it was a compilation issue introduced by @perseusdg during the develop of #218 in this commit 56feb54377c0678e42077fabbf84ce2fc138f4c5

I tested on my PC and on Xavier and it look ok now; tell me if the issue persist

dongxuanlb commented 3 years ago

Hi, Im still got only 15fps.

dongxuan@xavier:~/workspace/tkDNN/build$ ./demo yolo4_fp16.rt ../demo/yolo_test.mp4 y detection yolo4_fp16.rt New NetworkRT (TensorRT v7.13) Float16 support: 1 Int8 support: 1 DLAs: 2 create execution context Input/outputs numbers: 4 input index = 0 -> output index = 3 Data dim: 1 3 416 416 1 Data dim: 1 255 13 13 1 RtBuffer 0 dim: Data dim: 1 3 416 416 1 RtBuffer 1 dim: Data dim: 1 255 52 52 1 RtBuffer 2 dim: Data dim: 1 255 26 26 1 RtBuffer 3 dim: Data dim: 1 255 13 13 1 camera started ^Crequest gateway stop detection end

Time stats: Min: 59.4205 ms Max: 216.611 ms Avg: 64.0227 ms 15.6195 FPS