dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.86k stars 2.98k forks source link

Custom Object Detection - ssd-mobilenet-v2 with TRT optmization #585

Closed leandrovrabelo closed 1 year ago

leandrovrabelo commented 4 years ago

Hi,

I successfully converted my frozen_inference_graph.pb to an uff file and created and Engine following the https://github.com/AastaNV/TRT_object_detection instructions.

It's running very well with an inference time of 25ms, but I would like to try it with detectnet-camera.py (it's pretty organized and easy to use), what are the commands that I should use to run on this file?

I have the following files:

ssd_mobilenet_v2_coco_2018_03_29 - network tmp.uff - temporary uff file TRT_ssd_mobilenet_v2_coco_2018_03_29.bin - optimized file used for inference coco.py - list with Classes

Thanks in Advance.

denizcelik commented 4 years ago

Hi, how do i start to collect and train my custom dataset? Can you help me for what i need to do this? Thanks.

leandrovrabelo commented 4 years ago

Hi Deniz,

I followed this page https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html, I used the ssd_mobilenet_v2_coco_2018_03_29 model that has a very good accuracy and it's very fast.

Then, I optmized the model following these instructions https://github.com/AastaNV/TRT_object_detection

leandrovrabelo commented 4 years ago

Hi Dusty,

I tried to edit the detectNet.h/cpp files adding my custom model close to the SSD_MOBILENET_V2 info as you mentioned on this issue, on ./jetson_inference/c/detecNet.cpp I added the information on line 260-261 (just below the SSD_MOBILENET_V2 info) as per below:

else if( networkType == SSD_MOBILENET_V2 ) return Create("networks/SSD-Mobilenet-v2/ssd_mobilenet_v2_coco.uff", "networks/SSD-Mobilenet-v2/ssd_coco_labels.txt", threshold, "Input", Dims3(3,300,300), "NMS", "NMS_1", maxBatchSize, precision, device, allowGPUFallback);

else if( networkType == SSD_MOBILENET_V2_WEEDS ) return Create("networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff", "networks/SSD-Mobilenet-v2-Weeds/voc-weed-labels.txt", threshold, "Input", Dims3(3,300,300), "NMS", "NMS_1", maxBatchSize, precision, device, allowGPUFallback);

on the same file I added o line 315-216 the following info:

else if( strcasecmp(modelName, "ssd-mobilenet-v2") == 0 || strcasecmp(modelName, "coco-ssd-mobilenet-v2") == 0 || strcasecmp(modelName, "ssd-mobilenet") == 0 ) type = detectNet::SSD_MOBILENET_V2;

else if( strcasecmp(modelName, "ssd-mobilenet-v2-weeds") == 0 || strcasecmp(modelName, "coco-ssd-mobilenet-v2-weeds") == 0 || strcasecmp(modelName, "weeds") == 0 ) type = detectNet::SSD_MOBILENET_V2_WEEDS;

Inside the file ./jetson-inference/c/detecNet.h I inserted the following info on line 197:

SSD_MOBILENET_V2_WEEDS /**< SSD Inception-v2-Weeds UFF model, trained on MS-COCO */

Then, I noticed that I had to change my engine name to tmp_weeds.uff.1.1.GPU.FP16.engine, just a continuation of my .uff file tmp_weeds.uff.

Finally I used the following commands:

python3 detectnet-camera.py --network=/usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff --input_blob=Input --output_blob=NMS --output_count=NMS --class_labels=/usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/voc-weed-labels.txt --camera=/dev/video1

and I got the following error:

device GPU, /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff initialized. W = 7 H = 100 C = 1 detectNet -- maximum bounding boxes: 100 detectNet -- loaded 4 class info entries detectNet -- number of object classes: 4 jetson.utils -- PyCamera_New() jetson.utils -- PyCamera_Init() [gstreamer] initialized gstreamer, version 1.14.5.0 [gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVARGUS, camera /dev/video1 [gstreamer] gstCamera pipeline string: v4l2src device=/dev/video1 ! video/x-raw, width=(int)1280, height=(int)720, format=YUY2 ! videoconvert ! video/x-raw, format=RGB ! videoconvert !appsink name=mysink [gstreamer] gstCamera successfully initialized with GST_SOURCE_V4L2, camera /dev/video1 jetson.utils -- PyDisplay_New() jetson.utils -- PyDisplay_Init() [OpenGL] glDisplay -- X screen 0 resolution: 1280x800 [OpenGL] glDisplay -- display device initialized [gstreamer] opening gstCamera for streaming, transitioning pipeline to GST_STATE_PLAYING [gstreamer] gstreamer changed state from NULL to READY ==> mysink [gstreamer] gstreamer changed state from NULL to READY ==> videoconvert1 [gstreamer] gstreamer changed state from NULL to READY ==> capsfilter1 [gstreamer] gstreamer changed state from NULL to READY ==> videoconvert0 [gstreamer] gstreamer changed state from NULL to READY ==> capsfilter0 [gstreamer] gstreamer changed state from NULL to READY ==> v4l2src0 [gstreamer] gstreamer changed state from NULL to READY ==> pipeline0 [gstreamer] gstreamer changed state from READY to PAUSED ==> videoconvert1 [gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter1 [gstreamer] gstreamer changed state from READY to PAUSED ==> videoconvert0 [gstreamer] gstreamer changed state from READY to PAUSED ==> capsfilter0 [gstreamer] gstreamer stream status CREATE ==> src [gstreamer] gstreamer changed state from READY to PAUSED ==> v4l2src0 [gstreamer] gstreamer changed state from READY to PAUSED ==> pipeline0 [gstreamer] gstreamer msg new-clock ==> pipeline0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> videoconvert1 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter1 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> videoconvert0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> capsfilter0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> v4l2src0 [gstreamer] gstreamer stream status ENTER ==> src [gstreamer] gstreamer msg stream-start ==> pipeline0 [gstreamer] gstCamera onPreroll [gstreamer] gstCamera -- allocated 16 ringbuffers, 2764800 bytes each [gstreamer] gstreamer changed state from READY to PAUSED ==> mysink [gstreamer] gstreamer msg async-done ==> pipeline0 [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> mysink [gstreamer] gstreamer changed state from PAUSED to PLAYING ==> pipeline0 [gstreamer] gstCamera -- allocated 16 RGBA ringbuffers python3: nmsPlugin.cpp:106: virtual int nvinfer1::plugin::DetectionOutput::enqueue(int, const void const, void*, void, cudaStream_t): Assertion `status == STATUS_SUCCESS' failed. Aborted (core dumped)

When I followed this instruction (link) I managed to make my CUSTOM detection work on detectNet model, but I don't think it's the proper way to make it work.

Am I missing some command or other thing to make my custom model work by itself without substituting the defautl ssd-mobilenet-v2?

Thanks.

dusty-nv commented 4 years ago

Then, I noticed that I had to change my engine name to tmp_weeds.uff.1.1.GPU.FP16.engine, just a continuation of my .uff file tmp_weeds.uff.

Hi @leandrovrabelo , you should delete tmp_weeds.uff.1.1.GPU.FP16.engine and allow the application to generate it the first time it runs. When you renamed it, it is probably using the old TensorRT engine from previous model.

If you delete it, and run the application again, is it able to generate the TensorRT engine file?

Also, I recommend that you test on a static image (detectnet-console) first before trying camera on a new model.

leandrovrabelo commented 4 years ago

Hi Dusty,

I tested it on detecnet-console, I deleted the engine that I created and here is what happened:

[TRT] TensorRT version 6.0.1 [TRT] loading NVIDIA plugins... [TRT] Plugin Creator registration succeeded - GridAnchor_TRT [TRT] Plugin Creator registration succeeded - GridAnchorRect_TRT [TRT] Plugin Creator registration succeeded - NMS_TRT [TRT] Plugin Creator registration succeeded - Reorg_TRT [TRT] Plugin Creator registration succeeded - Region_TRT [TRT] Plugin Creator registration succeeded - Clip_TRT [TRT] Plugin Creator registration succeeded - LReLU_TRT [TRT] Plugin Creator registration succeeded - PriorBox_TRT [TRT] Plugin Creator registration succeeded - Normalize_TRT [TRT] Plugin Creator registration succeeded - RPROI_TRT [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT [TRT] Could not register plugin creator: FlattenConcat_TRT in namespace: [TRT] completed loading NVIDIA plugins. [TRT] detected model format - UFF (extension '.uff') [TRT] desired precision specified for GPU: FASTEST [TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8 [TRT] native precisions detected for GPU: FP32, FP16 [TRT] selecting fastest native precision for GPU: FP16 [TRT] attempting to open engine cache file /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff.1.1.GPU.FP16.engine [TRT] cache file not found, profiling network model on device GPU [TRT] device GPU, loading /usr/bin/ /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff [TRT] UFFParser: Parsing Input[Op: Input]. [TRT] UFFParser: Input -> [1,1,1] [TRT] UFFParser: Applying order forwarding to: Input [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/weights[Op: Const]. [TRT] UFFParser: FeatureExtractor/MobilenetV2/Conv/weights -> [3,3,3,32] [TRT] UFFParser: Applying order forwarding to: FeatureExtractor/MobilenetV2/Conv/weights [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/Conv2D[Op: Conv]. Inputs: Input, FeatureExtractor/MobilenetV2/Conv/weights [TRT] FeatureExtractor/MobilenetV2/Conv/Conv2D: kernel weights has count 864 but 288 was expected [TRT] FeatureExtractor/MobilenetV2/Conv/Conv2D: count of 864 weights in kernel, but kernel dimensions (3,3) with 1 input channels, 32 output channels and 1 groups were specified. Expected Weights count is 1 33 * 32 / 1 = 288 [TRT] UFFParser: FeatureExtractor/MobilenetV2/Conv/Conv2D -> [] [TRT] UFFParser: Applying order forwarding to: FeatureExtractor/MobilenetV2/Conv/Conv2D [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/BatchNorm/gamma[Op: Const]. [TRT] UFFParser: FeatureExtractor/MobilenetV2/Conv/BatchNorm/gamma -> [32] [TRT] UFFParser: Applying order forwarding to: FeatureExtractor/MobilenetV2/Conv/BatchNorm/gamma [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/BatchNorm/beta[Op: Const]. [TRT] UFFParser: FeatureExtractor/MobilenetV2/Conv/BatchNorm/beta -> [32] [TRT] UFFParser: Applying order forwarding to: FeatureExtractor/MobilenetV2/Conv/BatchNorm/beta [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_mean[Op: Const]. [TRT] UFFParser: FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_mean -> [32] [TRT] UFFParser: Applying order forwarding to: FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_mean [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_variance[Op: Const]. [TRT] UFFParser: FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_variance -> [32] [TRT] UFFParser: Applying order forwarding to: FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_variance [TRT] UFFParser: Parsing FeatureExtractor/MobilenetV2/Conv/BatchNorm/FusedBatchNorm[Op: BatchNorm]. Inputs: FeatureExtractor/MobilenetV2/Conv/Conv2D, FeatureExtractor/MobilenetV2/Conv/BatchNorm/gamma, FeatureExtractor/MobilenetV2/Conv/BatchNorm/beta, FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_mean, FeatureExtractor/MobilenetV2/Conv/BatchNorm/moving_variance [TRT] UffParser: Parser error: FeatureExtractor/MobilenetV2/Conv/BatchNorm/FusedBatchNorm: The input to the Scale Layer is required to have a minimum of 3 dimensions. [TRT] failed to parse UFF model '/usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff' [TRT] device GPU, failed to load /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff detectNet -- failed to initialize. jetson.inference -- detectNet failed to load built-in network '/usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff' PyTensorNet_Dealloc() Traceback (most recent call last): File "detectnet-console.py", line 51, in net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold) Exception: jetson.inference -- detectNet failed to load network jetson.utils -- freeing CUDA mapped memory

When I use the engine with name tmp_weeds.uff.1.1.GPU.FP16.engine see what happens:

[TRT] TensorRT version 6.0.1 [TRT] loading NVIDIA plugins... [TRT] Plugin Creator registration succeeded - GridAnchor_TRT [TRT] Plugin Creator registration succeeded - GridAnchorRect_TRT [TRT] Plugin Creator registration succeeded - NMS_TRT [TRT] Plugin Creator registration succeeded - Reorg_TRT [TRT] Plugin Creator registration succeeded - Region_TRT [TRT] Plugin Creator registration succeeded - Clip_TRT [TRT] Plugin Creator registration succeeded - LReLU_TRT [TRT] Plugin Creator registration succeeded - PriorBox_TRT [TRT] Plugin Creator registration succeeded - Normalize_TRT [TRT] Plugin Creator registration succeeded - RPROI_TRT [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT [TRT] Could not register plugin creator: FlattenConcat_TRT in namespace: [TRT] completed loading NVIDIA plugins. [TRT] detected model format - UFF (extension '.uff') [TRT] desired precision specified for GPU: FASTEST [TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8 [TRT] native precisions detected for GPU: FP32, FP16 [TRT] selecting fastest native precision for GPU: FP16 [TRT] attempting to open engine cache file /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff.1.1.GPU.FP16.engine [TRT] loading network profile from engine cache... /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff.1.1.GPU.FP16.engine [TRT] device GPU, /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff loaded [TRT] Deserialize required 3516711 microseconds. [TRT] device GPU, CUDA engine context initialized with 3 bindings [TRT] binding -- index 0 -- name 'Input' -- type FP32 -- in/out INPUT -- # dims 3 -- dim #0 3 (SPATIAL) -- dim #1 300 (SPATIAL) -- dim #2 300 (SPATIAL) [TRT] binding -- index 1 -- name 'NMS' -- type FP32 -- in/out OUTPUT -- # dims 3 -- dim #0 1 (SPATIAL) -- dim #1 100 (SPATIAL) -- dim #2 7 (SPATIAL) [TRT] binding -- index 2 -- name 'NMS_1' -- type FP32 -- in/out OUTPUT -- # dims 3 -- dim #0 1 (SPATIAL) -- dim #1 1 (SPATIAL) -- dim #2 1 (SPATIAL) [TRT] binding to input 0 Input binding index: 0 [TRT] binding to input 0 Input dims (b=1 c=3 h=300 w=300) size=1080000 [TRT] binding to output 0 NMS binding index: 1 [TRT] binding to output 0 NMS dims (b=1 c=1 h=100 w=7) size=2800 device GPU, /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff initialized. W = 7 H = 100 C = 1 detectNet -- maximum bounding boxes: 100 detectNet -- loaded 4 class info entries detectNet -- number of object classes: 4 Segmentation fault (core dumped)

denizcelik commented 4 years ago

Thank you so much for your answer.

leandrovrabelo commented 4 years ago

Hi Dusty, i'm still trying to figure out how to make my custom model work properly without substituting the default ssd-mobilinet-v2 model.

Do you think that this error is related to some dimentions that are set wrong?

'[TRT] UffParser: Parser error: FeatureExtractor/MobilenetV2/Conv/BatchNorm/FusedBatchNorm: The input to the Scale Layer is required to have a minimum of 3 dimensions. [TRT] failed to parse UFF model '/usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff' [TRT] device GPU, failed to load /usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff detectNet -- failed to initialize. jetson.inference -- detectNet failed to load built-in network '/usr/local/bin/networks/SSD-Mobilenet-v2-Weeds/tmp_weeds.uff' PyTensorNet_Dealloc() Traceback (most recent call last): File "detectnet-console.py", line 51, in net = jetson.inference.detectNet(opt.network, sys.argv, opt.threshold) Exception: jetson.inference -- detectNet failed to load network jetson.utils -- freeing CUDA mapped memory'

if it really don't work, should I retrain my model with https://github.com/dusty-nv/pytorch-ssd or should I try the depth branch of jetson inference first? https://github.com/dusty-nv/jetson-inference/tree/depth

ThiloFink commented 3 years ago

Hello, great work by leandrovrabelo. Does this conversion from Tensorflow 1.x to .uff and engine file work for every Tensorflow1 models from the zoo or is it specific for Mobilenet-SSD and Inception-SSD?

leandrovrabelo commented 3 years ago

hi @ThiloFink , I don't know, I just tested on these models.