chenzhi1992 / TensorRT-SSD

Use TensorRT API to implement Caffe-SSD, SSD(channel pruning), Mobilenet-SSD
251 stars 84 forks source link

the mobilenet-ssd fps(how many)? #30

Closed OMG59E closed 10 months ago

OMG59E commented 6 years ago

i have got the fps 200-220 on the gtx 1080. is it normal?

conv0 + conv0/relu 0.031ms conv1/dw + conv1/dw/relu 0.133ms conv1 + conv1/relu 0.055ms conv2/dw + conv2/dw/relu 0.090ms conv2 + conv2/relu 0.040ms conv3/dw + conv3/dw/relu 0.100ms conv3 + conv3/relu 0.051ms conv4/dw + conv4/dw/relu 0.044ms conv4 + conv4/relu 0.033ms conv5/dw + conv5/dw/relu 0.041ms conv5 + conv5/relu 0.049ms conv6/dw + conv6/dw/relu 0.023ms conv6 + conv6/relu 0.039ms conv7/dw + conv7/dw/relu 0.028ms conv7 + conv7/relu 0.058ms conv8/dw + conv8/dw/relu 0.028ms conv8 + conv8/relu 0.064ms conv9/dw + conv9/dw/relu 0.027ms conv9 + conv9/relu 0.063ms conv10/dw + conv10/dw/relu 0.027ms conv10 + conv10/relu 0.058ms conv11/dw + conv11/dw/relu 0.028ms conv11 + conv11/relu 0.062ms conv12/dw + conv12/dw/relu 0.019ms conv12 + conv12/relu 0.046ms conv13/dw + conv13/dw/relu 0.023ms conv13 + conv13/relu 0.078ms conv14_1 + conv14_1/relu 0.033ms conv14_2 + conv14_2/relu 0.195ms conv15_1 + conv15_1/relu 0.016ms conv15_2 + conv15_2/relu 0.093ms conv16_1 + conv16_1/relu 0.012ms conv16_2 + conv16_2/relu 0.106ms conv17_1 + conv17_1/relu 0.011ms conv17_2 + conv17_2/relu 0.067ms conv5_mbox_loc 0.021ms conv5_mbox_loc_perm 0.007ms conv5_mbox_loc_flat 0.004ms conv5_mbox_conf_new 0.020ms conv5_mbox_conf_perm 0.008ms conv5_mbox_conf_flat 0.004ms conv5_mbox_priorbox 0.009ms conv11_mbox_loc_new 0.018ms conv11_mbox_loc_perm 0.005ms conv11_mbox_loc_flat 0.003ms conv11_mbox_conf_new 0.013ms conv11_mbox_conf_perm 0.005ms conv11_mbox_conf_flat 0.004ms conv11_mbox_priorbox 0.007ms conv13_mbox_loc 0.024ms conv13_mbox_loc_perm 0.005ms conv13_mbox_loc_flat 0.003ms conv13_mbox_conf_new 0.021ms conv13_mbox_conf_perm 0.004ms conv13_mbox_conf_flat 0.007ms conv13_mbox_priorbox 0.006ms conv14_2_mbox_loc 0.015ms conv14_2_mbox_loc_perm 0.005ms conv14_2_mbox_loc_flat 0.003ms conv14_2_mbox_conf_new 0.014ms conv14_2_mbox_conf_perm 0.004ms conv14_2_mbox_conf_flat 0.003ms conv14_2_mbox_priorbox 0.005ms conv15_2_mbox_loc 0.010ms conv15_2_mbox_loc_perm 0.004ms conv15_2_mbox_loc_flat 0.003ms conv15_2_mbox_conf_new 0.010ms conv15_2_mbox_conf_perm 0.004ms conv15_2_mbox_conf_flat 0.003ms conv15_2_mbox_priorbox 0.005ms conv16_2_mbox_loc 0.010ms conv16_2_mbox_loc_perm 0.005ms conv16_2_mbox_loc_flat 0.003ms conv16_2_mbox_conf_new 0.010ms conv16_2_mbox_conf_perm 0.004ms conv16_2_mbox_conf_flat 0.003ms conv16_2_mbox_priorbox 0.005ms conv17_2_mbox_loc 0.010ms conv17_2_mbox_loc_perm 0.004ms conv17_2_mbox_loc_flat 0.003ms conv17_2_mbox_conf_new 0.010ms conv17_2_mbox_conf_perm 0.004ms conv17_2_mbox_conf_flat 0.003ms conv17_2_mbox_priorbox 0.005ms mbox_loc 0.016ms mbox_conf 0.016ms mbox_priorbox 0.027ms mbox_conf_reshape 0.004ms mbox_conf_softmax 0.161ms mbox_conf_flatten 0.008ms detection_out 0.450ms Time over all layers: 2.921

chenzhi1992 commented 6 years ago

I do not test it on 1080, but I think this fps should be normal.

myih commented 6 years ago

@OMG59E @chenzhi1992 I was able to get it running with TensorRT 4.0 but the results are incorrect. Around 5.6ms on 1080Ti without implementing depthwise conv plugin (use group conv). If I use self-define Cancat layer it will be a bit slower, around 6.5ms.

I can run VGG-SSD correctly (~12ms), so I'm not quite sure what's the problem with Mobilenet-SSD. I use chuanqi305's weight and @chenzhi1992 your MobileNet-SSD_iplugin.prototxt and modify pluginimplement accordingly.

Any advise on how to debug this? Thank you!

xmglin commented 6 years ago

@OMG59E I am also trying to implement tensorrt_mobilenet_ssd, using chuanqi305's weight and MobileNet-SSD_iplugin.prototxt. You are using ubuntu/cuda8/tensorrt3.0.4? My problem is "could not parse layer type IPlugin". Could you give me some advice about how to update pluginimplement?

myih commented 6 years ago

@xmglin I think I met this error before, you need to match the IPlugin layers' name in the .prototxt with pluginimplemenation.cpp/h's layer, this repo only provides VGG-SSD's pluginimplemenation so you have to change most of the plugin layers' name.

myih commented 6 years ago

@OMG59E I got the wrong inference time as I included the overheads... Using the timeInference included in tensorNet: Time over total layers: 2.561 conv0 + conv0/relu 0.090ms conv1/dw + conv1/dw/relu 0.121ms conv1 + conv1/relu 0.042ms conv2/dw + conv2/dw/relu 0.092ms conv2 + conv2/relu 0.031ms conv3/dw + conv3/dw/relu 0.075ms conv3 + conv3/relu 0.043ms conv4/dw + conv4/dw/relu 0.045ms conv4 + conv4/relu 0.031ms conv5/dw + conv5/dw/relu 0.034ms conv5 + conv5/relu 0.045ms conv6/dw + conv6/dw/relu 0.025ms conv6 + conv6/relu 0.041ms conv7/dw + conv7/dw/relu 0.026ms conv7 + conv7/relu 0.063ms conv8/dw + conv8/dw/relu 0.025ms conv8 + conv8/relu 0.057ms conv9/dw + conv9/dw/relu 0.025ms conv9 + conv9/relu 0.056ms conv10/dw + conv10/dw/relu 0.025ms conv10 + conv10/relu 0.056ms conv11/dw + conv11/dw/relu 0.026ms conv11 + conv11/relu 0.058ms conv12/dw + conv12/dw/relu 0.018ms conv12 + conv12/relu 0.042ms conv13/dw + conv13/dw/relu 0.023ms conv13 + conv13/relu 0.072ms conv14_1 + conv14_1/relu 0.038ms conv14_2 + conv14_2/relu 0.219ms conv15_1 + conv15_1/relu 0.018ms conv15_2 + conv15_2/relu 0.113ms conv16_1 + conv16_1/relu 0.014ms conv16_2 + conv16_2/relu 0.109ms conv17_1 + conv17_1/relu 0.011ms conv17_2 + conv17_2/relu 0.084ms conv11_mbox_loc 0.021ms conv11_mbox_loc_perm 0.006ms conv11_mbox_loc_flat 0.003ms conv11_mbox_conf 0.026ms conv11_mbox_conf_perm 0.006ms conv11_mbox_conf_flat 0.004ms conv11_mbox_priorbox 0.011ms conv13_mbox_loc 0.026ms conv13_mbox_loc_perm 0.010ms conv13_mbox_loc_flat 0.004ms conv13_mbox_conf 0.026ms conv13_mbox_conf_perm 0.005ms conv13_mbox_conf_flat 0.004ms conv13_mbox_priorbox 0.006ms conv14_2_mbox_loc 0.017ms conv14_2_mbox_loc_perm 0.008ms conv14_2_mbox_loc_flat 0.004ms conv14_2_mbox_conf 0.019ms conv14_2_mbox_conf_perm 0.006ms conv14_2_mbox_conf_flat 0.003ms conv14_2_mbox_priorbox 0.006ms conv15_2_mbox_loc 0.012ms conv15_2_mbox_loc_perm 0.007ms conv15_2_mbox_loc_flat 0.003ms conv15_2_mbox_conf 0.013ms conv15_2_mbox_conf_perm 0.005ms conv15_2_mbox_conf_flat 0.003ms conv15_2_mbox_priorbox 0.006ms conv16_2_mbox_loc 0.011ms conv16_2_mbox_loc_perm 0.004ms conv16_2_mbox_loc_flat 0.005ms conv16_2_mbox_conf 0.010ms conv16_2_mbox_conf_perm 0.005ms conv16_2_mbox_conf_flat 0.004ms conv16_2_mbox_priorbox 0.006ms conv17_2_mbox_loc 0.009ms conv17_2_mbox_loc_perm 0.004ms conv17_2_mbox_loc_flat 0.004ms conv17_2_mbox_conf 0.008ms conv17_2_mbox_conf_perm 0.004ms conv17_2_mbox_conf_flat 0.004ms conv17_2_mbox_priorbox 0.005ms mbox_loc 0.012ms mbox_conf 0.012ms mbox_priorbox 0.024ms mbox_conf_reshape 0.004ms mbox_conf_softmax 0.026ms mbox_conf_flatten 0.006ms detection_out 0.234ms

Ghustwb commented 5 years ago

@OMG59E Could you share your pluginimplement code? I met some problems,but I can not slove them.Thanks