chenzhi1992 / TensorRT-SSD

Use TensorRT API to implement Caffe-SSD, SSD(channel pruning), Mobilenet-SSD
251 stars 84 forks source link

Mobilenetssd #5

Open linux-devil opened 6 years ago

linux-devil commented 6 years ago

I will be very much interested to implement tensor-rt-mobilenetssd . Any directions will be helpful

chenzhi1992 commented 6 years ago

You should use TensorRT api to implement the depthwise layer . @linux-devil

linux-devil commented 6 years ago

Have you implemented it ?

chenzhi1992 commented 6 years ago

Yes

发自网易邮箱大师 On 01/31/2018 19:26, Harshit Sharma wrote:

Have you implemented it ?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

linux-devil commented 6 years ago

So I am using tensorrt3.0 , do we have to implement the depthwise layer for 3.0 version as well?

linux-devil commented 6 years ago

I have jetson tx2 and I am just trying this out of hobby, but seems like I am stuck now. Any hints will be appreciated. Thanks

chenzhi1992 commented 6 years ago

You have to understand the principle of depthwise layer first, and implement it using cuda code

linux-devil commented 6 years ago

Found this : https://github.com/yonghenglh6/DepthwiseConvolution

chenzhi1992 commented 6 years ago

You can use it.

linux-devil commented 6 years ago

Cool , let me try this one and take some hints and implement something. Will ping you if I am stuck somewhere.

daniel-pp commented 6 years ago

Hi @chenzhi1992 , for the 40-43fps using Mobilenet-SSD — is this with pruning? Are you using fp16? And I also second the question about depthwise convolution, it seems to me that TensorRT 3.0 has a standard implementation for it. Or is it not efficient enough?

chenzhi1992 commented 6 years ago

@daniel-pp

  1. This time is good enough for me, so I don't use pruning.
  2. I donot use fp16, beacuse Iplugin layer don't support fp16. If you can use it, you can guide me.Thanks.
  3. I used caffe model, and there is not a standard implementation for caffe depthwise layer.
linux-devil commented 6 years ago

What is your email @chenzhi1992

chenzhi1992 commented 6 years ago

@linux-devil chenzhi745320@163.com

daniel-pp commented 6 years ago

@chenzhi1992 Thanks for the answers! Some more questions:

  1. For MobileNet SSD are you using this implementation: https://github.com/chuanqi305/MobileNet-SSD ?
  2. What is the input dimension of your network — is it 300x300?
  3. TensorRT 3.0 seems to support Depthwise convolution — the Caffe Parser parses it without errors, and also according to the documentation, TensorRT 3.0 supports the group parameter, which is used in depthwise convolution. But it seems to be really slow. Will you share your implementation of the depthwise layer?
linux-devil commented 6 years ago

@daniel-pp What is the output size and Output_layers parameter are you using. I am using https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/topics/topics/workflows/caffe_to_tensorrt.html

daniel-pp commented 6 years ago

@linux-devil There are two outputs: "detection_out" and "count" "detection_out" is a float array of dimension batch_size keep_top_k 7 "count" is an int array of dimension batch_size For the DetectionOutputParameters I'm using for example {true, false, 0, 21, 100, 100, 0.25, 0.45, CENTER_SIZE} In this case keep_top_k = 100

linux-devil commented 6 years ago

Thanks @daniel-pp i am using : https://github.com/chuanqi305/MobileNet-SSD implementation

linux-devil commented 6 years ago

https://github.com/chuanqi305/MobileNet-SSD , Here i see only detection_out as output layer . Are you using same implementation of SSD?

linux-devil commented 6 years ago

layer { name: "detection_out" type: "DetectionOutput" bottom: "mbox_loc" bottom: "mbox_conf_flatten" bottom: "mbox_priorbox" top: "detection_out" include { phase: TEST } detection_output_param { num_classes: 21 share_location: true background_label_id: 0 nms_param { nms_threshold: 0.45 top_k: 100 } code_type: CENTER_SIZE keep_top_k: 100 confidence_threshold: 0.25 } }

linux-devil commented 6 years ago

@daniel-pp I am using tensor-rt 3.0.2 , I am getting :Could not parse deploy file [libprotobuf ERROR google/protobuf/text_format.cc:298] Error parsing text-format ditcaffe.NetParameter: 1177:17: Message type "ditcaffe.LayerParameter" has no field named "permute_param".

daniel-pp commented 6 years ago

@linux-devil Yeah, you should first remove all "_param" blocks from your .prototxt file to get rid of the error. You need to implement all the unsupported layers (Permute, Flatten, Reshape) using the Plugin API, and pass the corresponding parameters to the Plugin Layers.

As for the detection_out layer, you need to add an additional output, e.g. 'top: "count" '. That's what the TensorRT implementation of the DetectionOutput layer assumes (took me a while to figure that one out).

linux-devil commented 6 years ago

@daniel-pp Thanks for response . Another quick question. If I remove _param layers do I need to retrain my model with new prototxt ? Or I should just remove from deploy.prototxt file and not from train and test.prototxt

linux-devil commented 6 years ago

@daniel-pp Do you have implementation for Permute, Flatten, Reshape layers that I can use?

daniel-pp commented 6 years ago

@linux-devil You don't need to retrain your model, just do the modifications to the deploy file (they don't change the weights).

You can find the implementations of Flatten and Reshape in the code in this repository, or alternatively in the Plugin API sample shipped with TensorRT. Permute, PriorBox etc. are created using createSSDPermutePlugin etc. see NvInferPlugin.h

linux-devil commented 6 years ago

@daniel-pp Thanks

gaddekumar commented 6 years ago

Hi, how are your detection results on MobileNet SSD ? I have tried with a confidence of 0.45 and the results are bad.

Optimus1072 commented 6 years ago

@linux-devil @daniel-pp After removing the _param block and updating the top param, I am getting this error "could not parse layer type PriorBox" .As far I have read TensorRT docs, I found it is already implemented by them. Can you give me some pointers to figure out this or if possible share your .prototxt and .caffemodel Thanks :-)

PS: I am also using https://github.com/chuanqi305/MobileNet-SSD implementation

chandrakantkhandelwal commented 6 years ago

@daniel-pp I am able to get proper output for model trained on VOC dataset (21 classes), however when I use a model trained on custom dataset (with classes less than 21), I get zeros as output box coordinates. I have changed the class no in pluginimplement.h and pluginimplement.cpp. Do I need to change anything else to make it work for model with output classes different than 21? Thanks