ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
718 stars 209 forks source link

tkDNN windows 10 support #218

Closed perseusdg closed 3 years ago

perseusdg commented 3 years ago

added support for windows 10,tested final code on windows using msvc 16.7 and gcc 9.3 on linux

ceccocats commented 3 years ago

Hi, nice work! I have tested on nvidia xavier and seems that all is working nicely. Can you run the tests also on windows? you need to download COCO test dataset (for UINT8 calibration): bash scripts/download_validation.sh COCO And run this test script: bash ./scripts/test_all_tests.sh with line 45 uncommented to check all the inference precisions: modes=( 1 2 3 ) # FP32, FP16 and INT8

On FP16 and INT8 if you get TENSORRT ERROR it means that the result is not strictly the same as the ground truth which is normal with lower precision.

This is the output on Xavier ``` $ bash ./scripts/test_all_tests.sh rm: cannot remove 'results.log': No such file or directory rm: cannot remove '*rt': No such file or directory Test FP32 Batch 2 mnist OK batched mnist OK imuodom OK yolo4 OK batched yolo4 OK yolo4x OK batched yolo4x OK yolo4_berkeley OK batched yolo4_berkeley OK yolo4tiny OK batched yolo4tiny OK yolo3 OK batched yolo3 OK yolo3_berkeley OK batched yolo3_berkeley OK yolo3_coco4 OK batched yolo3_coco4 OK yolo3_flir OK batched yolo3_flir OK yolo3_512 OK batched yolo3_512 OK yolo3tiny OK batched yolo3tiny OK yolo3tiny_512 OK batched yolo3tiny_512 OK yolo2 OK batched yolo2 OK yolo2_voc OK batched yolo2_voc OK csresnext50-panet-spp OK batched csresnext50-panet-spp OK resnet101_cnet OK batched resnet101_cnet OK dla34_cnet OK batched dla34_cnet OK mobilenetv2ssd OK batched mobilenetv2ssd OK mobilenetv2ssd512 OK batched mobilenetv2ssd512 OK bdd-mobilenetv2ssd OK batched bdd-mobilenetv2ssd OK Test FP16 Batch 2 mnist OK batched mnist OK imuodom OK yolo4 TENSORRT ERROR batched yolo4 OK yolo4x TENSORRT ERROR batched yolo4x OK yolo4_berkeley TENSORRT ERROR batched yolo4_berkeley OK yolo4tiny TENSORRT ERROR batched yolo4tiny OK yolo3 TENSORRT ERROR batched yolo3 OK yolo3_berkeley TENSORRT ERROR batched yolo3_berkeley OK yolo3_coco4 OK batched yolo3_coco4 OK yolo3_flir TENSORRT ERROR batched yolo3_flir OK yolo3_512 TENSORRT ERROR batched yolo3_512 OK yolo3tiny TENSORRT ERROR batched yolo3tiny OK yolo3tiny_512 TENSORRT ERROR batched yolo3tiny_512 OK yolo2 TENSORRT ERROR batched yolo2 OK yolo2_voc TENSORRT ERROR batched yolo2_voc OK csresnext50-panet-spp TENSORRT ERROR batched csresnext50-panet-spp OK resnet101_cnet TENSORRT ERROR batched resnet101_cnet OK dla34_cnet TENSORRT ERROR batched dla34_cnet OK mobilenetv2ssd CUDNN vs TENSORRT ERROR batched mobilenetv2ssd OK mobilenetv2ssd512 TENSORRT ERROR batched mobilenetv2ssd512 OK bdd-mobilenetv2ssd CUDNN vs TENSORRT ERROR batched bdd-mobilenetv2ssd OK Test INT8 Batch 2 mnist OK batched mnist OK imuodom OK yolo4 TENSORRT ERROR batched yolo4 OK yolo4x TENSORRT ERROR batched yolo4x OK yolo4_berkeley TENSORRT ERROR batched yolo4_berkeley OK yolo4tiny TENSORRT ERROR batched yolo4tiny OK yolo3 TENSORRT ERROR batched yolo3 OK yolo3_berkeley TENSORRT ERROR batched yolo3_berkeley OK yolo3_coco4 TENSORRT ERROR batched yolo3_coco4 OK yolo3_flir TENSORRT ERROR batched yolo3_flir OK yolo3_512 TENSORRT ERROR batched yolo3_512 OK yolo3tiny TENSORRT ERROR batched yolo3tiny OK yolo3tiny_512 TENSORRT ERROR batched yolo3tiny_512 OK yolo2 TENSORRT ERROR batched yolo2 OK yolo2_voc TENSORRT ERROR batched yolo2_voc OK csresnext50-panet-spp TENSORRT ERROR batched csresnext50-panet-spp OK resnet101_cnet TENSORRT ERROR batched resnet101_cnet OK dla34_cnet TENSORRT ERROR batched dla34_cnet OK mobilenetv2ssd FATAL ERROR batched mobilenetv2ssd OK mobilenetv2ssd512 TENSORRT ERROR batched mobilenetv2ssd512 OK bdd-mobilenetv2ssd FATAL ERROR batched bdd-mobilenetv2ssd OK If errors occured, check logfile results.log ``` I have to check FATAL ERROR on mobilenet INT8 i think is broken also on master
perseusdg commented 3 years ago

I ran test_mobilenetv2ssd on both the master branch and the pull request,i believe fatal_error is a result of the differences in ground truth vs trt and trt vs cudnn i have attached the result below.

====== CUDNN inference ======
Data dim: 1 3 300 300 1
Data dim: 1 3000 1 4 1

===== TENSORRT inference ====
Data dim: 1 3 300 300 1
Data dim: 1 3000 1 4 1

==== RESNET CHECK RESULTS ===
CUDNN vs correct
 | OK ~0.02
 | OK ~0.02
TRT   vs correct

 | [ 0 ]: 1.13652 0.26275
 | [ 1 ]: -0.723999 -0.604409
 | [ 2 ]: -2.61438 -2.63829
 | [ 3 ]: 0.786822 1.02715
 | [ 4 ]: -2.5647 -2.2925
 | [ 5 ]: -0.331258 -0.129529
 | [ 6 ]: -0.678444 -0.582798
 | [ 7 ]: 0.362422 0.243937
 | [ 8 ]: 5.70219 5.92325
 | Wrongs: 125 ~0.02

 | [ 0 ]: 0.285774 0.212167
 | [ 2 ]: -0.58573 -0.744845
 | [ 3 ]: -0.320327 -0.396461
 | [ 4 ]: 0.276414 0.207184
 | [ 6 ]: -0.805042 -0.957715
 | [ 7 ]: -0.55411 -0.630006
 | [ 8 ]: 0.263108 0.195007
 | [ 10 ]: -0.837736 -1.00465
 | [ 11 ]: 1.39087 1.32298
 | Wrongs: 18 ~0.02
CUDNN vs TRT    

 | [ 0 ]: 0.26275 1.13652
 | [ 1 ]: -0.604407 -0.723999
 | [ 2 ]: -2.63829 -2.61438
 | [ 3 ]: 1.02715 0.786822
 | [ 4 ]: -2.2925 -2.5647
 | [ 5 ]: -0.129528 -0.331258
 | [ 6 ]: -0.582797 -0.678444
 | [ 7 ]: 0.243937 0.362422
 | [ 8 ]: 5.92325 5.70219
 | Wrongs: 125 ~0.02

 | [ 0 ]: 0.212167 0.285774
 | [ 2 ]: -0.744845 -0.58573
 | [ 3 ]: -0.396461 -0.320327
 | [ 4 ]: 0.207184 0.276414
 | [ 6 ]: -0.957715 -0.805042
 | [ 7 ]: -0.630006 -0.55411
 | [ 8 ]: 0.195007 0.263108
 | [ 10 ]: -1.00465 -0.837736
 | [ 11 ]: 1.32298 1.39087
 | Wrongs: 18 ~0.02
---------------------------------------------------
Confidence CUDNN
0.977479 0.986367 0.97799 0.971273 0.974232 0.966501 0.971578 0.978977 0.974027 0.966054 0.970724 0.965549 0.971925 0.977542 0.9728 0.96913 0.968801 0.970229 0.973139 0.977897 0.973712 0.972364 0.969004 0.973687 0.97271 0.977654 0.973859 0.972518 0.969997 0.974779 0.971348 0.97871 0.974254 0.969607 0.970592 0.973098 0.968411 0.974884 0.970043 0.965388 0.96499 0.969276 0.972695 0.979673 0.974703 0.970112 0.970413 0.972816 0.97409 0.98164 0.974896 0.972365 0.970575 0.973586 0.972991 0.979948 0.974937 0.970246 0.970642 0.972096 0.967579 0.974887 0.970754 0.964808 
Locations CUDNN
0.895468 1.02586 -3.26465 -1.48142 0.571279 0.503577 -1.00306 -0.205038 1.10684 0.168612 -6.14164 -1.78341 0.977233 1.70472 -3.54303 -2.92774 1.22893 0.0161056 -6.67807 -1.96537 0.632564 1.90412 -3.56912 -3.54311 0.355613 1.31305 -1.31578 -1.24236 0.740068 0.9646 -0.640562 -0.77711 0.988481 0.883645 -2.83893 -0.885611 -0.0565745 1.88677 -3.79773 -3.65738 1.09986 0.707278 -3.49885 -0.932392 -0.522438 1.9762 -3.57368 -4.03056 0.246989 1.50757 -0.316974 -1.59678 0.309205 1.23061 -0.0445376 -1.21936 0.477111 1.56142 -0.727183 -0.659404 0.112813 1.7245 -2.30625 -4.02627 
---------------------------------------------------
Confidence tensorRT
0.975285 0.985188 0.976798 0.967267 0.97316 0.962462 0.970714 0.97868 0.97349 0.964915 0.970325 0.964382 0.970723 0.9759 0.971986 0.966283 0.968603 0.96712 0.973128 0.977473 0.974416 0.970923 0.970486 0.971926 0.97122 0.97548 0.972259 0.969485 0.968933 0.971035 0.971303 0.977997 0.973513 0.968757 0.96987 0.970581 0.971378 0.977987 0.972374 0.968555 0.96753 0.970285 0.976098 0.983278 0.977583 0.973809 0.973688 0.975522 0.977194 0.984369 0.977863 0.975222 0.97441 0.976219 0.976756 0.982267 0.977701 0.973338 0.974006 0.974096 0.97184 0.978243 0.974404 0.968246 
Locations tensorRT
0.777498 1.04938 -3.02812 -1.38129 0.356407 0.499822 -1.15978 -0.138548 0.885423 0.423159 -5.3693 -1.81432 0.86543 1.63186 -3.15014 -2.73491 0.963911 0.195935 -5.65607 -1.83503 0.617409 1.79146 -3.1489 -3.38831 0.181724 1.29949 -1.74323 -1.32329 0.553508 0.916589 -0.672828 -0.59451 0.897496 0.805475 -3.33593 -1.02396 -0.33048 1.88618 -3.68143 -3.40749 1.01278 0.548814 -3.77655 -1.00644 -0.707318 2.00011 -3.4136 -3.77147 0.336892 1.38483 -0.523918 -1.35259 0.304683 1.10269 -0.145261 -0.930751 0.495511 1.32969 -1.04086 -0.517759 0.196518 1.75169 -2.70762 -3.82106 
---------------------------------------------------
CUDNN vs TRT    

 | [ 342 ]: 0.963264 0.983403
 | [ 347 ]: 0.953026 0.974363
 | [ 2940 ]: 0.0357078 0.107297
 | [ 2943 ]: 0.412983 0.694689
 | [ 26940 ]: 0.829936 0.708185
 | [ 26941 ]: 0.869318 0.797475
 | [ 26943 ]: 0.506431 0.244212
 | [ 26994 ]: 0.826695 0.757312
 | [ 26995 ]: 0.811752 0.73976
 | Wrongs: 21 ~0.02

 | [ 0 ]: 0.895468 0.777498
 | [ 1 ]: 1.02586 1.04938
 | [ 2 ]: -3.26465 -3.02812
 | [ 3 ]: -1.48142 -1.38129
 | [ 4 ]: 0.571279 0.356407
 | [ 6 ]: -1.00306 -1.15978
 | [ 7 ]: -0.205038 -0.138548
 | [ 8 ]: 1.10684 0.885423
 | [ 9 ]: 0.168612 0.423159
 | Wrongs: 9800 ~0.02
perseusdg commented 3 years ago

Also i tested this on windows before creating a pull request and had the same issue with mobilenet running in int8 mode ,i assumed it was fine due to difference between the result and ground truth in int8 mode.I am unable to test fp16 as my gpu doesnt support it

ceccocats commented 3 years ago

Hi, I'm compiling on a old tensorrt and i get an error on this line: https://github.com/ceccocats/tkDNN/blob/a638592fc74668471e87ac930b4695ce99dc7d43/src/NetworkRT.cpp#L143

Is the shared pointer necessary? without it it works fine on linux

perseusdg commented 3 years ago

No the shared pointer isn't necessary, I thought i had removed all instances of shared pointers and unique pointers that I had created I guess I must have missed this once place, do I create a new pull request to undo this shared pointer?

ceccocats commented 3 years ago

No the shared pointer isn't necessary, I thought i had removed all instances of shared pointers and unique pointers that I had created I guess I must have missed this once place, do I create a new pull request to undo this shared pointer?

fix in ba8199a03088b7c8e36066a1636a1237ab316cec