Livox-SDK / livox_detection

Livox open source detection algorithm
Apache License 2.0
270 stars 56 forks source link

inference on tensorrt #6

Closed zttbx closed 3 years ago

zttbx commented 3 years ago

i try to make a c++ version of inference, but after i trans the model to onnx and run it on tensorrt 2080ti, the inference time of each layer are as follow: {Cast} 0.301ms Conv/Conv2D5 0.138ms Conv/Conv2D 0.707ms ReduceMean9 132.021ms Sub11 0.218ms (Unnamed Layer* 10) [ElementWise] + Redu 123.283ms ReduceProd30:0[Constant] 0.001ms Cast17 0.004ms Div18 0.004ms PWN((Unnamed Layer 24) [ElementWise], P 0.235ms Conv/BatchNorm/Const:0 + (Unnamed Layer 0.221ms Conv_1/Conv2D 0.936ms ReduceMean23 121.545ms Sub25 0.217ms (Unnamed Layer 37) [ElementWise] + Redu 120.613ms Cast31 0.004ms Div32 0.004ms PWN((Unnamed Layer 51) [ElementWise], P 0.221ms Conv/BatchNorm/Const:0_1 + (Unnamed Laye 0.224ms MaxPool2D/MaxPool 0.140ms Conv_2/Conv2D 0.045ms ReduceMean39 29.410ms Sub41 0.028ms (Unnamed Layer 65) [ElementWise] + Redu 29.799ms ReduceProd74:0[Constant] 0.001ms Cast47 0.004ms Div__48 0.004ms PWN((Unnamed Layer 79) [ElementWise], P 0.031ms Conv_2/BatchNorm/Const:0 + (Unnamed Laye 0.032ms Conv_3/Conv2D 0.163ms ReduceMean53 29.614ms Sub55 0.055ms (Unnamed Layer 92) [ElementWise] + Redu 30.101ms Cast61 0.004ms Div62 0.004ms PWN((Unnamed Layer 106) [ElementWise], 0.057ms Conv/BatchNorm/Const:0_4 + (Unnamed Laye 0.059ms add 0.083ms Conv_4/Conv2D 0.467ms ReduceMean67 29.748ms Sub69 0.112ms (Unnamed Layer 120) [ElementWise] + Red 29.792ms Cast75 0.003ms Div76 0.004ms PWN((Unnamed Layer 134) [ElementWise], 0.359ms Conv_4/BatchNorm/Const:0 + (Unnamed Laye 0.114ms MaxPool2D_1/MaxPool 0.072ms Conv_5/Conv2D 0.031ms ReduceMean83 7.426ms Sub85 0.013ms (Unnamed Layer 148) [ElementWise] + Red 7.421ms ReduceProd146:0[Constant] 0.002ms Cast91 0.004ms Div__92 0.004ms PWN((Unnamed Layer 162) [ElementWise], 0.018ms Conv/BatchNorm/Const:0_7 + (Unnamed Laye 0.013ms Conv_6/Conv2D 0.147ms ReduceMean97 7.433ms Sub99 0.028ms (Unnamed Layer 175) [ElementWise] + Red 7.622ms Cast105 0.004ms Div106 0.004ms PWN((Unnamed Layer 189) [ElementWise], 0.030ms Conv_4/BatchNorm/Const:0_9 + (Unnamed La 0.032ms add_1 0.043ms Conv_7/Conv2D 0.031ms ReduceMean111 7.541ms Sub113 0.014ms (Unnamed Layer 203) [ElementWise] + Red 7.431ms Cast119 0.004ms Div120 0.004ms PWN((Unnamed Layer 217) [ElementWise], 0.018ms Conv/BatchNorm/Const:0_11 + (Unnamed Lay 0.012ms Conv_8/Conv2D 0.150ms ReduceMean125 7.323ms Sub127 0.027ms (Unnamed Layer 230) [ElementWise] + Red 7.427ms Cast133 0.003ms Div134 0.004ms PWN((Unnamed Layer 244) [ElementWise], 0.030ms Conv_4/BatchNorm/Const:0_13 + (Unnamed L 0.031ms add_2 0.044ms Conv_9/Conv2D 0.466ms ReduceMean139 7.607ms Sub141 0.055ms (Unnamed Layer 258) [ElementWise] + Red 7.927ms Cast147 0.003ms Div148 0.004ms PWN((Unnamed Layer 272) [ElementWise], 0.058ms Conv_9/BatchNorm/Const:0 + (Unnamed Laye 0.059ms MaxPool2D_2/MaxPool 0.040ms Conv_10/Conv2D 0.031ms ReduceMean155 1.785ms Sub157 0.009ms (Unnamed Layer 286) [ElementWise] + Red 1.743ms ReduceProd559:0[Constant] 0.001ms Cast163 0.004ms Div__164 0.004ms PWN((Unnamed Layer 300) [ElementWise], 0.010ms Conv_4/BatchNorm/Const:0_16 + (Unnamed L 0.007ms Conv_11/Conv2D 0.124ms ReduceMean169 1.807ms Sub171 0.013ms (Unnamed Layer 313) [ElementWise] + Red 1.863ms Cast177 0.003ms Div178 0.004ms PWN((Unnamed Layer 327) [ElementWise], 0.019ms Conv_9/BatchNorm/Const:0_18 + (Unnamed L 0.014ms add_3 0.023ms Conv_12/Conv2D 0.032ms ReduceMean183 1.743ms Sub185 0.007ms (Unnamed Layer 341) [ElementWise] + Red 1.743ms Cast191 0.003ms Div192 0.004ms PWN((Unnamed Layer 355) [ElementWise], 0.010ms Conv_4/BatchNorm/Const:0_20 + (Unnamed L 0.006ms Conv_13/Conv2D 0.124ms ReduceMean197 1.848ms Sub199 0.013ms (Unnamed Layer 368) [ElementWise] + Red 1.891ms Cast205 0.003ms Div206 0.004ms PWN((Unnamed Layer 382) [ElementWise], 0.020ms Conv_9/BatchNorm/Const:0_22 + (Unnamed L 0.013ms add_4 0.023ms Conv_14/Conv2D 0.032ms ReduceMean211 1.830ms Sub213 0.008ms (Unnamed Layer 396) [ElementWise] + Red 1.743ms Cast219 0.004ms Div220 0.003ms PWN((Unnamed Layer 410) [ElementWise], 0.011ms Conv_4/BatchNorm/Const:0_24 + (Unnamed L 0.007ms Conv_15/Conv2D 0.124ms ReduceMean225 1.930ms Sub227 0.013ms (Unnamed Layer 423) [ElementWise] + Red 1.937ms Cast233 0.004ms Div234 0.003ms PWN((Unnamed Layer 437) [ElementWise], 0.020ms Conv_9/BatchNorm/Const:0_26 + (Unnamed L 0.013ms add_5 0.023ms Conv_16/Conv2D 0.032ms ReduceMean239 1.745ms Sub241 0.008ms (Unnamed Layer 451) [ElementWise] + Red 1.743ms Cast247 0.004ms Div248 0.004ms PWN((Unnamed Layer 465) [ElementWise], 0.011ms Conv_4/BatchNorm/Const:0_28 + (Unnamed L 0.007ms Conv_17/Conv2D 0.124ms ReduceMean253 1.807ms Sub255 0.013ms (Unnamed Layer 478) [ElementWise] + Red 1.811ms Cast261 0.003ms Div262 0.004ms PWN((Unnamed Layer 492) [ElementWise], 0.019ms Conv_9/BatchNorm/Const:0_30 + (Unnamed L 0.013ms add_6 0.024ms Conv_18/Conv2D 0.442ms ReduceMean267 1.945ms Sub269 0.027ms (Unnamed Layer 506) [ElementWise] + Red 1.910ms Cast275 0.003ms Div276 0.004ms PWN((Unnamed Layer 520) [ElementWise], 0.034ms Conv_18/BatchNorm/Const:0 + (Unnamed Lay 0.031ms MaxPool2D_3/MaxPool 0.019ms Conv_19/Conv2D 0.037ms ReduceMean283 0.472ms Sub285 0.006ms (Unnamed Layer 534) [ElementWise] + Red 0.515ms ReduceProd500:0[Constant] 0.001ms Cast291 0.004ms Div__292 0.004ms PWN((Unnamed Layer 548) [ElementWise], 0.007ms Conv_9/BatchNorm/Const:0_33 + (Unnamed L 0.006ms Conv_20/Conv2D 0.116ms ReduceMean297 0.509ms Sub299 0.008ms (Unnamed Layer 561) [ElementWise] + Red 0.472ms Cast305 0.003ms Div306 0.003ms PWN((Unnamed Layer 575) [ElementWise], 0.011ms Conv_18/BatchNorm/Const:0_35 + (Unnamed 0.007ms add_7 0.010ms Conv_21/Conv2D 0.037ms ReduceMean311 0.472ms Sub313 0.005ms (Unnamed Layer 589) [ElementWise] + Red 0.526ms Cast319 0.004ms Div320 0.004ms PWN((Unnamed Layer 603) [ElementWise], 0.007ms Conv_9/BatchNorm/Const:0_37 + (Unnamed L 0.005ms Conv_22/Conv2D 0.116ms ReduceMean325 0.509ms Sub327 0.007ms (Unnamed Layer 616) [ElementWise] + Red 0.471ms Cast333 0.003ms Div334 0.004ms PWN((Unnamed Layer 630) [ElementWise], 0.011ms Conv_18/BatchNorm/Const:0_39 + (Unnamed 0.006ms add_8 0.010ms Conv_23/Conv2D 0.037ms ReduceMean339 0.471ms Sub341 0.006ms (Unnamed Layer 644) [ElementWise] + Red 0.473ms Cast347 0.003ms Div348 0.004ms PWN((Unnamed Layer 658) [ElementWise], 0.007ms Conv_9/BatchNorm/Const:0_41 + (Unnamed L 0.005ms Conv_24/Conv2D 0.116ms ReduceMean353 0.510ms Sub355 0.008ms (Unnamed Layer 671) [ElementWise] + Red 0.472ms Cast361 0.003ms Div362 0.004ms PWN((Unnamed Layer 685) [ElementWise], 0.010ms Conv_18/BatchNorm/Const:0_43 + (Unnamed 0.006ms add_9 0.010ms Conv_25/Conv2D 0.037ms ReduceMean367 0.472ms Sub369 0.005ms (Unnamed Layer 699) [ElementWise] + Red 0.538ms Cast375 0.003ms Div376 0.003ms PWN((Unnamed Layer 713) [ElementWise], 0.008ms Conv_9/BatchNorm/Const:0_45 + (Unnamed L 0.005ms Conv_26/Conv2D 0.117ms ReduceMean381 0.509ms Sub383 0.007ms (Unnamed Layer 726) [ElementWise] + Red 0.472ms Cast389 0.003ms Div390 0.004ms PWN((Unnamed Layer 740) [ElementWise], 0.010ms Conv_18/BatchNorm/Const:0_47 + (Unnamed 0.006ms add_10 0.010ms Conv_27/Conv2D 0.037ms ReduceMean395 0.472ms Sub397 0.006ms (Unnamed Layer 754) [ElementWise] + Red 0.472ms Cast403 0.003ms Div404 0.004ms PWN((Unnamed Layer 768) [ElementWise], 0.007ms Conv_9/BatchNorm/Const:0_49 + (Unnamed L 0.005ms Conv_28/Conv2D 0.116ms ReduceMean409 0.510ms Sub411 0.007ms (Unnamed Layer 781) [ElementWise] + Red 0.472ms Cast417 0.003ms Div418 0.004ms PWN((Unnamed Layer 795) [ElementWise], 0.010ms Conv_18/BatchNorm/Const:0_51 + (Unnamed 0.006ms add_11 0.010ms Conv_29/Conv2D 0.038ms ReduceMean423 0.604ms Sub425 0.006ms (Unnamed Layer 809) [ElementWise] + Red 0.472ms Cast431 0.003ms Div432 0.003ms PWN((Unnamed Layer 823) [ElementWise], 0.007ms Conv_9/BatchNorm/Const:0_53 + (Unnamed L 0.005ms Conv_30/Conv2D 0.116ms ReduceMean437 0.509ms Sub439 0.008ms (Unnamed Layer 836) [ElementWise] + Red 0.472ms Cast445 0.003ms Div446 0.004ms PWN((Unnamed Layer 850) [ElementWise], 0.011ms Conv_18/BatchNorm/Const:0_55 + (Unnamed 0.006ms add_12 0.010ms Conv_31/Conv2D 0.052ms ReduceMean451 0.472ms Sub453 0.007ms (Unnamed Layer 864) [ElementWise] + Red 0.472ms Cast459 0.003ms Div460 0.004ms PWN((Unnamed Layer 878) [ElementWise], 0.010ms Conv_18/BatchNorm/Const:0_57 + (Unnamed 0.006ms Conv_32/Conv2D 0.433ms ReduceMean465 0.525ms Sub467 0.013ms (Unnamed Layer 891) [ElementWise] + Red 0.530ms Cast473 0.004ms Div474 0.004ms PWN((Unnamed Layer 905) [ElementWise], 0.021ms Conv_32/BatchNorm/Const:0 + (Unnamed Lay 0.152ms Conv_33/Conv2D 0.095ms ReduceMean479 0.472ms Sub481 0.007ms (Unnamed Layer 918) [ElementWise] + Red 0.473ms Cast487 0.003ms Div488 0.004ms PWN((Unnamed Layer 932) [ElementWise], 0.010ms Conv_18/BatchNorm/Const:0_60 + (Unnamed 0.006ms Conv_34/Conv2D 0.037ms ReduceMean493 0.472ms Sub495 0.005ms (Unnamed Layer 945) [ElementWise] + Red 0.473ms Cast501 0.003ms Div502 0.004ms PWN((Unnamed Layer 959) [ElementWise], 0.007ms Conv_9/BatchNorm/Const:0_62 + (Unnamed L 0.005ms Resize505 0.024ms Resize505:0 copy 0.010ms Conv_35/Conv2D 0.083ms ReduceMean510 1.743ms Sub512 0.013ms (Unnamed Layer 974) [ElementWise] + Red 1.979ms Cast518 0.003ms Div519 0.004ms PWN((Unnamed Layer 988) [ElementWise], 0.019ms Conv_9/BatchNorm/Const:0_64 + (Unnamed L 0.014ms Conv_36/Conv2D 0.442ms ReduceMean524 1.898ms Sub526 0.027ms (Unnamed Layer 1001) [ElementWise] + Re 1.857ms Cast532 0.003ms Div533 0.004ms PWN((Unnamed Layer 1015) [ElementWise], 0.034ms Conv_18/BatchNorm/Const:0_66 + (Unnamed 0.031ms Conv_37/Conv2D 0.082ms ReduceMean538 1.913ms Sub540 0.013ms (Unnamed Layer 1028) [ElementWise] + Re 1.810ms Cast546 0.003ms Div547 0.004ms PWN((Unnamed Layer 1042) [ElementWise], 0.020ms Conv_9/BatchNorm/Const:0_68 + (Unnamed L 0.013ms Conv_38/Conv2D 0.224ms ReduceMean552 1.805ms Sub554 0.014ms (Unnamed Layer 1055) [ElementWise] + Re 1.807ms Cast560 0.004ms Div561 0.006ms PWN((Unnamed Layer 1069) [ElementWise], 0.019ms Conv_9/BatchNorm/Const:0_70 + (Unnamed L 0.013ms Conv_39/BiasAdd 0.052ms Conv_39/BiasAdd__563 0.007ms Time over all layers: 826.180 the most time-consuming layer is always reduce_mean, how to get 20 fps when inference, thx

zttbx commented 3 years ago

i have fix it by set is_traing False, now inference time is around 10ms

rosexplorer commented 3 years ago

Hello, I am new to Tensorflow and tensorrt. I need to accelerate this model as you did. I already researched a lot and I only get confused more and more the longer I search. Can you please share the path how you transformed the model into tensorrt? Thanks in advance.

zttbx commented 3 years ago

@rosexplorer first export the pb file from the tensorflow code, than trans the pb to onnx model, at last, import onnx model using tensorrt's onnx parser.

rosexplorer commented 3 years ago

aren't there to few files in the pre trained model to export it to the pb file? I watched several tutorials and they had a checkpoint file and a pbtxt file in it and the exported it to the pb file.

zttbx commented 3 years ago

@rosexplorer share you the onnx file 链接: https://pan.baidu.com/s/19yt_mmEhPjIntvfDSVu1NQ 密码: pmet

rosexplorer commented 3 years ago

I don't want to click on a link, which I don't know where it is from. Could you please describe how you got the pb file?

zttbx commented 3 years ago

image @rosexplorer only need to add some code like this, in function detect

rosexplorer commented 3 years ago

@zttbx Thanks for the help. I tried it and a pb file was created and I wanted to convert it into a onnx model but when I execute this command: python3 -m tf2onnx.convert --saved-model livox_detection-master/model/saved_model.pb --output model.onnx I only get this traceback: Illegal instruction (core dumped)

Do you know what could be the problem?

rosexplorer commented 3 years ago

@zttbx Thanks for the help. I tried it and a pb file was created and I wanted to convert it into a onnx model but when I execute this command: python3 -m tf2onnx.convert --saved-model livox_detection-master/model/saved_model.pb --output model.onnx I only get this traceback: Illegal instruction (core dumped)

Do you know what could be the problem?

I simply "solved" this problem by reinstalling tensorflow.

When I want to transform the .pb file to onnx, I get a traceback like this: RuntimeError: MetaGraphDef associated with tags 'serve' could not be found in SavedModel. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI:saved_model_cli available_tags: [set()] I don't know what is wrong with this file.

lintheyoung commented 2 years ago

zttbx

Hi, zttbx, thank you for your awesome job, but the baiduyun's link about the onnx model have lost, can you share it again? thank you so much!