Open angiend opened 6 years ago
Hi, You can use TensorRT3.0 API (createSSDDetectionOutputPlugin) directly, and you can find it at NvInferPlugin.h.
@chenzhi1992 I tried the code and able to run it on X86_64 machine, for model trained on VOC dataset and also model trained on custom dataset (classes less than 21). However when I try it on TX1, I am able to run the pretrained SSD-VOC model, but not the model trained on custom dataset (classes different than voc classes). I get output as zeros. It would be great if you could provide some insight. Thanks!
Hi @chenzhi1992 have you tried using it with a number of classes different to 21?
I'm working with one class and one background (2 in total) but all I get is:
virtual void nvinfer1::plugin::DetectionOutput::configure(const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, int): Assertion 'numPriors*numLocClasses*4 == inputDims[param.inputOrder[0]].d[0]' failed.
when calling:
ICudaEngine* engine = builder->buildCudaEngine(*network);
Do you have any idea on why this might be happening?
The dimension of mbox_priorbox is wrong, and you can check it again. @paghdv
I have exactly the same dimension that I have in caffe: (1,2,122656,1) (NCHW) Is it supposed to be something else? @chenzhi1992
@paghdv @chenzhi1992
Did you solve this problem? I have the same error Assertion 'numPriors*numLocClasses*4 == inputDims[param.inputOrder[0]].d[0]' failed.
I've already ran TensorRT-SSD successfully and now trying to run Mobilenet-SSD with TensorRT3.
Thank you.
@myih I ended-up just implementing detection_out. Mobilenet-SSD runs but you have to implement the depthwise conv layer yourself to see any speedups otherwise uses the default (loop through groups) method in TensorRT that is very slow.
@paghdv Thanks for the fast reply. I'm able to use the createSSDDetectionOutputPlugin to run VGG-SSD so maybe it's not the detection-layer's problem? What version of TensorRT and cudnn are you using? I wrote some depthwise convolution code because like you said the group convolution was slow before, but I saw someone saying that the latest cudnn7 patch will support depthwise convolution. Thank you.
@paghdv I solve the error by updating to TensorRT4 and specify inputOrder = {0, 1, 2}, but the detection output result is incorrect. Were you able to get correct output when using group convolution and not your own depthwise convolution? Thank you!
@myih
Hi!I am using tensorrt4.0 and I met the same question:virtual void nvinfer1::plugin::DetectionOutput::configure(const nvinfer1::Dims*, int, const nvinfer1::Dims*, int, int): Assertion numPriors*numLocClasses*4 == inputDims[param.inputOrder[0]].d[0]' failed
how did you solve it?Could you show me more details?
Yes, I do get the correct result with grouped convolutions.
On Mon, Aug 13, 2018, 6:16 PM myih notifications@github.com wrote:
@paghdv https://github.com/paghdv I solve the error by updating to TensorRT4 and specify inputOrder = {0, 1, 2}, but the detection output result is incorrect. Were you able to get correct output when using group convolution and not your own depthwise convolution? Thank you!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chenzhi1992/TensorRT-SSD/issues/26#issuecomment-412720150, or mute the thread https://github.com/notifications/unsubscribe-auth/ADhxI7USbt-1OiC2SJ6YHZi-xWkb5UUtks5uQiT3gaJpZM4UFmEs .
-- AG
@paghdv Thanks for the reply.
@zbw4034 What model and weights are you trying to run? I solve this problem after setting inputOrder = {0, 1, 2} in createSSDDetectionOutputPlugin, also you might have some incorrect layer names or duplicates.
@myih did you implement your own softmax?
@paghdv I use this implementation Teoge/tensorrt-ssd-easy , it works for vgg-ssd and I checked the cuda code seems legit.
Damn I totally forgot about Weiliu and chuanqi305 use different image loader... Now it can run on 1080ti with ~150fps (without depthwise convolution implemented, is it normal?)
@paghdv Thanks for the reply.
@zbw4034 What model and weights are you trying to run? I solve this problem after setting inputOrder = {0, 1, 2} in createSSDDetectionOutputPlugin, also you might have some incorrect layer names or duplicates.
what is the meaning of "setting inputOrder = {0, 1, 2} in createSSDDetectionOutputPlugin"? could you describe is in detail? Please
@Ghustwb In TensorRT4 the API allows you to specify the order/layout of the input, not in TensorRT3 though You can find description of the API here
@Ghustwb In TensorRT4 the API allows you to specify the order/layout of the input, not in TensorRT3 though You can find description of the API here
Thanks for your reply,I tried tensorRT4.0,use "setting inputOrder = {0, 1, 2} in createSSDDetectionOutputPlugin",the Assertion error still exists
virtual void nvinfer1::plugin::DetectionOutput::configure(const nvinfer1::Dims, int, const nvinfer1::Dims, int, int): Assertion numPriorsnumLocClasses4 == inputDims[param.inputOrder[0]].d[0]' failed
@Ghustwb Maybe try working with another repo like this one And check the plugin layer name carefully, when I worked on mobilenetSSD I was sure I had them correctly for some days before I found the typo...
@Ghustwb Maybe try working with another repo like this one And check the plugin layer name carefully, when I worked on mobilenetSSD I was sure I had them correctly for some days before I found the typo...
Thank you for your help, mobileNet-SSD runs successfully. But the detection result is incorrect. I printed the detection result, the Obj result contains 7 values, such as 0 3 0.75 0.24 0.35 0.25 0.35 3 ---> class index 0.75 ----> confidence 0.24-----> xmin 0.35-----> ymin 0.25-----> xmas 0.35-----> ymax The value of confidence and class index seems to be correct, but the box value is obviously wrong.
Can you help me? Thanks
Hi: first ,thanks for your code samples,but i do not understand how to use "Forward_DetectionOutputLayer"? i see your comment "I removed the 'detection_out' layer, and calculated the final output through the mbox_loc, mbox_prior and mbox_conf layer output" but the "mbox_loc " plugin layer is use the "createConcatPlugin" ,how to get this layer's outputs? i think that it should rewrite interface for DetectionOut like "class DetectionOut : public IPlugin" and in the enqueue(int batchSize, const voidconst inputs, void** outputs) function to get the input as mbox_loc,how do you think about it ? @chenzhi1992