Azure-Samples / NVIDIA-Deepstream-Azure-IoT-Edge-on-a-NVIDIA-Jetson-Nano

This is a sample showing how to do real-time video analytics with NVIDIA DeepStream connected to Azure via Azure IoT Edge. It uses a NVIDIA Jetson Nano device that can process up to 8 real-time video streams concurrently.
MIT License
172 stars 56 forks source link

Source for libnvdsinfer_custom_impl_Yolo_Custom_Vision.so #18

Closed brianegge closed 3 years ago

brianegge commented 4 years ago

I can run the plugin, but my bounding boxes just flicker at position 0,0. I think this is because I don't have the same number of classes (3) as in the example. I've tried another yolov2 plugin and it does not work either. I think this one would work if I could modify the classes. FWIW, my Custom Vision model is returning a tensor shaped 45x13x13. I have three classes, and yolov2 has 5 anchors, so I don't know how the 45 items is calculated. Other yolov2 models return 13x13x125 for 20 classes.

zkaiser-ff commented 4 years ago

I'm running into the same issue as above, can you please provide the source code for the library listed?

brianegge commented 4 years ago

I think I could have saved myself a lot of effort and just used the Small-S1 target. Anyway, I went through their python code and adapted it to deepstream.

diff --git b/nvdsinfer_custom_impl_Yolo/Makefile a/nvdsinfer_custom_impl_Yolo/Makefile
index 8b85b86..3511cd5 100644
--- b/nvdsinfer_custom_impl_Yolo/Makefile
+++ a/nvdsinfer_custom_impl_Yolo/Makefile
@@ -20,7 +20,7 @@
 # DEALINGS IN THE SOFTWARE.
 ################################################################################

-CUDA_VER?=
+CUDA_VER?=10.2
 ifeq ($(CUDA_VER),)
   $(error "CUDA_VER is not set")
 endif
diff --git b/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp a/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
index 17f47ee..357a11e 100644
--- b/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
+++ a/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp
@@ -30,7 +30,7 @@
 #include "nvdsinfer_custom_impl.h"
 #include "trt_utils.h"

-static const int NUM_CLASSES_YOLO = 80;
+static const int NUM_CLASSES_YOLO = 4;

 extern "C" bool NvDsInferParseCustomYoloV3(
     std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
@@ -89,6 +89,20 @@ static NvDsInferParseObjectInfo convertBBox(const float& bx, const float& by, co
     return b;
 }

+static float logistic(float x)
+{
+    if (x > 0.0f)
+    {
+        return (float)(1.0f / (1.0f + exp(-x)));
+    }
+    else
+    {
+        auto e = exp(x);
+        return (float)(e / (1.0f + e));
+    }
+}
+
+
 static void addBBoxProposal(const float bx, const float by, const float bw, const float bh,
                      const uint stride, const uint& netW, const uint& netH, const int maxIndex,
                      const float maxProb, std::vector<NvDsInferParseObjectInfo>& binfo)
@@ -98,16 +112,21 @@ static void addBBoxProposal(const float bx, const float by, const float bw, cons

     bbi.detectionConfidence = maxProb;
     bbi.classId = maxIndex;
+    if (bbi.detectionConfidence > 0)
+      std::cout << "classId=" << maxIndex << ",conf=" << bbi.detectionConfidence << ",left=" << bbi.left << ",top=" << bbi.top << ",w=" << bbi.width << ",h=" << bbi.height << std::endl;
     binfo.push_back(bbi);
 }
+template <class T>
 static std::vector<NvDsInferParseObjectInfo>
 decodeYoloV2Tensor(
-    const float* detections, const std::vector<float> &anchors,
+    const T* detections, const std::vector<float> &anchors,
     const uint gridSizeW, const uint gridSizeH, const uint stride, const uint numBBoxes,
     const uint numOutputClasses, const uint& netW,
     const uint& netH)
 {
+    std::cout << "Detecting with type size " << sizeof(T) << ", classes=" << numOutputClasses;
+    std::cout << ", stride=" << stride << ", numBBoxes=" << numBBoxes << ", netW=" << netW << ", newH=" << netH << std::endl;
     std::vector<NvDsInferParseObjectInfo> binfo;
     for (uint y = 0; y < gridSizeH; ++y) {
         for (uint x = 0; x < gridSizeW; ++x) {
@@ -127,8 +146,11 @@ decodeYoloV2Tensor(
                 const float bh
                     = ph * exp (detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 3)]);

-                const float objectness
+                float objectness
                     = detections[bbindex + numGridCells * (b * (5 + numOutputClasses) + 4)];
+                std::cout << "objectness=" << objectness;
+                //objectness=logistic(objectness);
+                std::cout << "," << objectness << std::endl;

                 float maxProb = 0.0f;
                 int maxIndex = -1;
@@ -308,6 +330,7 @@ extern "C" bool NvDsInferParseCustomYoloV3Tiny(
         kANCHORS, kMASKS);
 }
+template <int num_classes_yolo>
 static bool NvDsInferParseYoloV2(
     std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
     NvDsInferNetworkInfo const& networkInfo,
@@ -325,11 +348,11 @@ static bool NvDsInferParseYoloV2(
     }
     const NvDsInferLayerInfo &layer = outputLayersInfo[0];

-    if (NUM_CLASSES_YOLO != detectionParams.numClassesConfigured)
+    if (num_classes_yolo != detectionParams.numClassesConfigured)
     {
         std::cerr << "WARNING: Num classes mismatch. Configured:"
                   << detectionParams.numClassesConfigured
-                  << ", detected by network: " << NUM_CLASSES_YOLO << std::endl;
+                  << ", detected by network: " << num_classes_yolo << std::endl;
     }

     assert(layer.inferDims.numDims == 3);
@@ -337,12 +360,26 @@ static bool NvDsInferParseYoloV2(
     const uint gridSizeW = layer.inferDims.d[2];
     const uint stride = DIVUP(networkInfo.width, gridSizeW);
     assert(stride == DIVUP(networkInfo.height, gridSizeH));
+    assert(layer.inferDims.d[0] == kNUM_BBOXES * (5 + num_classes_yolo));
     for (auto& anchor : anchors) {
         anchor *= stride;
     }
-    std::vector<NvDsInferParseObjectInfo> objects =
-        decodeYoloV2Tensor((const float*)(layer.buffer), anchors, gridSizeW, gridSizeH, stride, kNUM_BBOXES,
-                   NUM_CLASSES_YOLO, networkInfo.width, networkInfo.height);
+    std::vector<NvDsInferParseObjectInfo> objects;
+    if (layer.dataType == NvDsInferDataType::HALF)
+    {
+        objects = decodeYoloV2Tensor<__fp16>((const __fp16*)(layer.buffer), anchors, gridSizeW, gridSizeH, stride, kNUM_BBOXES,
+                   num_classes_yolo, networkInfo.width, networkInfo.height);
+    }
+    else if (layer.dataType == NvDsInferDataType::FLOAT)
+    {
+        objects = decodeYoloV2Tensor<float>((const float*)(layer.buffer), anchors, gridSizeW, gridSizeH, stride, kNUM_BBOXES,
+                   num_classes_yolo, networkInfo.width, networkInfo.height);
+    }
+    else
+    {
+      throw std::runtime_error("unsupport output layer type");
+    }
+
     objectList = objects;

@@ -355,17 +392,27 @@ extern "C" bool NvDsInferParseCustomYoloV2(
     NvDsInferParseDetectionParams const& detectionParams,
     std::vector<NvDsInferParseObjectInfo>& objectList)
 {
-    return NvDsInferParseYoloV2 (
+    return NvDsInferParseYoloV2<80> (
         outputLayersInfo, networkInfo, detectionParams, objectList);
 }

-extern "C" bool NvDsInferParseCustomYoloV2Tiny(
+extern "C" bool NvDsInferParseCustomYoloV2_4(
+    std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
+    NvDsInferNetworkInfo const& networkInfo,
+    NvDsInferParseDetectionParams const& detectionParams,
+    std::vector<NvDsInferParseObjectInfo>& objectList)
+{
+    return NvDsInferParseYoloV2<4> (
+        outputLayersInfo, networkInfo, detectionParams, objectList);
+}
+
+extern "C" bool NvDsInferParseCustomYoloV2_80(
     std::vector<NvDsInferLayerInfo> const& outputLayersInfo,
     NvDsInferNetworkInfo const& networkInfo,
     NvDsInferParseDetectionParams const& detectionParams,
     std::vector<NvDsInferParseObjectInfo>& objectList)
 {
-    return NvDsInferParseYoloV2 (
+    return NvDsInferParseYoloV2<80> (
         outputLayersInfo, networkInfo, detectionParams, objectList);
 }

@@ -425,5 +472,6 @@ extern "C" bool NvDsInferParseCustomYoloTLT(
 CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV3);
 CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV3Tiny);
 CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV2);
-CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV2Tiny);
+CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV2_4);
+CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloV2_80);
 CHECK_CUSTOM_PARSE_FUNC_PROTOTYPE(NvDsInferParseCustomYoloTLT);

I got this working for a 416x416 image, but put it on hold. I'm converting the model to ONNX and then having python convert it to tensorrt and running it on static frames from python. In Python it's been much easier to run at higher resolution.

zkaiser-ff commented 4 years ago

Thanks! Do you have the full cpp file? This looks like just the diff

brianegge commented 4 years ago

Uploaded here, just now: https://github.com/brianegge/deepstream_objectDetector_Yolo/tree/master/nvdsinfer_custom_impl_Yolo

I made the number of classes a template parameter - you'll need to customize to your number of classes.

zkaiser-ff commented 4 years ago

Did you modify anything other than the nvdsinfer_custom_impl_Yolo.cpp file? I modified to the 1 class that my model detects, but am not getting any bounding boxes on my sink.

brianegge commented 4 years ago

Sorry - I didn't keep a good record of the steps I took. Basically I had a simple image which would detect fine through the python process, then I piped that image into deep stream, and added logging until it finally drew the same bounding box.

The problem is, if the shape or anything is off, then you get nothing or noise. I tried adding a few asserts around the shape checking. I now run my images at 688x384, which is 21 x 12 grid, in python. It took forever to get this to work, but the shape is much better for my images. I've thought about porting this to deepstream, but thought it was too much effort. Plus in C++ I never was able to get multiple batches working with my model, and I want 4 or 8 images in a batch. Looked into running the triton server on the Nano, and that doesn't work for most models.

My deep stream config was this:

[primary-gie]
enable=1
gpu-id=0
config-file=/home/egge/detector/configs/config_infer_primary_egge.txt
batch-size=1
bbox-border-color0=1;0;0;0.7
bbox-border-color1=0;1;1;0.7
bbox-border-color2=0;1;1;0.7
bbox-border-color3=0;1;0;0.7
interval=15
gie-unique-id=1
nvbuf-memory-type=0

Which loads this detector:

[property]
gpu-id=0
net-scale-factor=1.0
#net-scale-factor=0.0039215697906911373
#0=RGB, 1=BGR
model-color-format=1
onnx-file=/home/egge/detector/models/model32.onnx
# /bin/trtexec --onnx=models/model32.onnx --explicitBatch --saveEngine=models/model.onnx_b3_fp32.engine
model-engine-file=/home/egge/detector/models/model.onnx_b3_fp32_wide.engine
labelfile-path=/home/egge/detector/models/labels.txt
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=0
num-detected-classes=4
gie-unique-id=1
network-type=0
is-classifier=0
## 0=Group Rectangles, 1=DBSCAN, 2=NMS, 3= DBSCAN+NMS Hybrid, 4 = None(No clustering)
cluster-mode=2
maintain-aspect-ratio=1
parse-bbox-func-name=NvDsInferParseCustomYoloV2_4
custom-lib-path=/opt/nvidia/deepstream/deepstream-5.0/sources/objectDetector_Yolo/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
#engine-create-func-name=NvDsInferYoloCudaEngineGet
#scaling-filter=0
#scaling-compute-hw=0

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.5
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
zkaiser-ff commented 4 years ago

I figured out the issue, it was a scaling thing like you mentioned. Thanks for the help, I really appreciate it!

chilljl commented 3 years ago

"For DeepStream to understand how to parse the bounding boxes provided by a model from Custom Vision, we need to download an extra library:

wget -O libnvdsinfer_custom_impl_Yolo_Custom_Vision.so --no-check-certificate "https://onedrive.live.com/d"

Is this library available for x86, trying to run this on a T4

crazyoutlook commented 3 years ago

I am using custom vision model with DeepStream. I have deployed it and it is working. However, in inference results, I get an extra object detection class "Vehicle" which is not there in my model. Please suggest

emmanuel-bv commented 3 years ago

Sorry for the delay. This sample has been updated to support Jetpak 4.5 and DeepStream 5.1. Along with this update the source code of the custom yolo parsing library has been released and the updated compiled library has also been provided.