dlstreamer / dlstreamer

This repository is a home to Intel® Deep Learning Streamer (Intel® DL Streamer) Pipeline Framework. Pipeline Framework is a streaming media analytics framework, based on GStreamer* multimedia framework, for creating complex media analytics pipelines.
https://dlstreamer.github.io
MIT License
528 stars 172 forks source link

Proper way to use object-class under gvadetect #434

Closed antoniomtz closed 1 month ago

antoniomtz commented 2 months ago

Hello,

I'm trying to apply object-class to the yolov8 pipeline as follows:

gst-launch-1.0 v4l2src device=/dev/video2 ! decodebin ! gvadetect \
model=/home/dlstreamer/intel/dl_streamer/models/public/yolov8s/FP32/yolov8s.xml \
model-proc=/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json \
device=CPU pre-process-backend=ie inference-region=roi-list object_class=person \
! queue ! gvawatermark ! videoconvertscale ! gvafpscounter ! autovideosink sync=false

I learned that I need to apply inference-region=roi-list in order to make object_class=person works, however, I'm not getting any detections of people or any class.

What is the proper way to use inference-region=roi-list object_class=person inside the gvadetect?

brmarkus commented 2 months ago

(must be dash object-class instead of understore object_class, see e.g. "https://dlstreamer.github.io/elements/gvadetect.html")

antoniomtz commented 1 month ago

@brmarkus Thank you for pointing that out. However, the issue still persists with no detection. Adding inference-region=roi-list only gives no detections. Can you guide me on how to filter a class using inference-region and object-class attributes on gvadetect?

brmarkus commented 1 month ago

It should work as you tried...

If you just remove both inference-region=roi-list object_class=person from the pipeline, do you see it working, i.e. detecting all objects YOLO8S is supposed to detect, you see the bounding-boxes? And when adding gvametapublish (e.g. to write into a a JSON file) you see the objects and the expected labels, the expected object classes?

What does your used file "/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json" look like, have you changed something in it?

BTW: Which version of DLStreamer do you use, in which environment (native, Docker?), which OperatingSystem are you working on? (earlier versions of DLStreamer needed to be patched manually in order to get YoloV8 supported)

antoniomtz commented 1 month ago

I'm using a the intel/dlstreamer:latest tag in a docker environment on Ubuntu 22.04 I haven't changed anything on the /opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json , it is as it comes with the image.

If I remove both inference-region=roi-list object-class=person I do see detections of all objects including a person (bounding-boxes). When adding gvametapublish I can see the json data with all the inference data in it.

If I add inference-region=roi-list object-class=person there is no detections of people (I'm in front of the camera) and no data from gvametapublish.

This query doesn't detect anything:

gst-launch-1.0 v4l2src device=/dev/video0 ! decodebin ! gvadetect model=/home/dlstreamer/intel/dl_streamer/models/public/yolov8s/FP32/yolov8s.xml model-proc=/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json device=CPU pre-process-backend=ie inference-region=roi-list object-class=person ! queue ! gvawatermark ! gvametaconvert add-tensor-data=true ! gvametapublish file-format=json-lines file-path=output.json ! videoconvertscale ! gvafpscounter ! autovideosink sync=false

This query does detect objects:

gst-launch-1.0 v4l2src device=/dev/video0 ! decodebin ! gvadetect model=/home/dlstreamer/intel/dl_streamer/models/public/yolov8s/FP32/yolov8s.xml model-proc=/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json device=CPU pre-process-backend=ie ! queue ! gvawatermark ! gvametaconvert add-tensor-data=true ! gvametapublish file-format=json-lines file-path=output.json ! videoconvertscale ! gvafpscounter ! autovideosink sync=false
brmarkus commented 1 month ago

I wasn't able to get my environment up and running very well using intel/dlstreamer:latest (and Python 3.12 in it), getting a lot of warnings and errors the way I usually test it. So I have used intel/dlstreamer:2024.1.2-ubuntu22 instead (with Python 3.10 in it).

Will take a bit more time before I can word on trying to reproduce it, but will be back in the course of today.

brmarkus commented 1 month ago

Ok. I couldn't get any bounding box when using gvadetet with inference-region=roi-list object-class=person, and when using gvametacnvert and gvametapublish, then the created JSON file is just empty.

When removing inference-region=roi-list object-class=person from gvadetect, adding gvaclassify with inference-region=roi-list object-class=person in addition to gvadetect, then I do get bounding boxes visualized again, and the JSON file contains metadata - however, all types of objects are displayed with a bounding box and all metadata is written into the JSON file... Filtering for the given metadata doesn't seem to work.

For gvadetect and inference-region=roi-list object-class=person I was wondering to which ROI-list it should be applied to... because prior to gvadetect nobody provides ROI lists... And only using object-class=person fails with an error (need to increase gst debug verbosity level to see the log message) saying that applying it to a full frame is not supported.

But when a gvaclassify follows gvadetect (which outputs ROI-lists), then the classification is applied to ths ROI lists... but without filtering for a given object class label. (I also tried to use the object-class-id "0" instead of a label string "person", but no difference).

This requires more in-depth debugging from the DLStreamer team I think.

brmarkus commented 1 month ago

When adding e.g. GST_DEBUG=gva*:5 in front of the gst-launch-1.0 command line then I get a few object-class related log messages, excerpt:

INFO      gva_base_inference gva_base_inference.cpp:813:gva_base_inference_update_object_classes:<gvadetect0> info: object classes update was not completed
gva_base_inference gva_base_inference.cpp:813:gva_base_inference_update_object_classes:<gvadetect0> info: empty inference instance: retry will be performed once instance will be acquired
... ...
INFO      gva_base_inference gva_base_inference.cpp:868:gva_base_inference_start:<gvadetect0> gvadetect0 inference parameters:
 -- Model: /home/dlstreamer/models/public/yolov8s/FP16/yolov8s.xml
 -- Model proc: /opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json
 -- Device: CPU
 -- Inference interval: 1
 -- Reshape: false
 -- Batch size: 0
 -- Reshape width: 0
 -- Reshape height: 0
 -- No block: false
 -- Num of requests: 0
 -- Model instance ID: (null)
 -- CPU streams: 0
 -- GPU streams: 0
 -- IE config:
 -- Allocator name: (null)
 -- Preprocessing type: ie
 -- Object class: person
*-- Labels: (null)

=> why is Labels: (null)??

The model-proc "/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json" is used containing the labels.

pmalatyn commented 1 month ago

@brmarkus can you paste content of the model-proc? I would like to understand how labels are defined. 2nd question did you try provide labels via a separate file?

brmarkus commented 1 month ago

Yes, sure, I can provide the model-proc file - it's the original one inside the DL-Streamer Docker container "/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json":

(from inside the Docker container)

$ cat /opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json
{
    "json_schema_version": "2.2.0",
    "input_preproc": [
      {
          "params": {
              "resize": "aspect-ratio",
              "range": [0.0, 1.0]
          }
      }
    ],
    "output_postproc": [
      {
        "converter": "yolo_v8",
        "labels": [
            "person",
            "bicycle",
            "car",
            "motorcycle",
            "airplane",
            "bus",
            "train",
            "truck",
            "boat",
            "trafficlight",
            "firehydrant",
            "stopsign",
            "parkingmeter",
            "bench",
            "bird",
            "cat",
            "dog",
            "horse",
            "sheep",
            "cow",
            "elephant",
            "bear",
            "zebra",
            "giraffe",
            "backpack",
            "umbrella",
            "handbag",
            "tie",
            "suitcase",
            "frisbee",
            "skis",
            "snowboard",
            "sportsball",
            "kite",
            "baseballbat",
            "baseballglove",
            "skateboard",
            "surfboard",
            "tennisracket",
            "bottle",
            "wineglass",
            "cup",
            "fork",
            "knife",
            "spoon",
            "bowl",
            "banana",
            "apple",
            "sandwich",
            "orange",
            "broccoli",
            "carrot",
            "hotdog",
            "pizza",
            "donut",
            "cake",
            "chair",
            "couch",
            "pottedplant",
            "bed",
            "diningtable",
            "toilet",
            "tv",
            "laptop",
            "mouse",
            "remote",
            "keyboard",
            "cellphone",
            "microwave",
            "oven",
            "toaster",
            "sink",
            "refrigerator",
            "book",
            "clock",
            "vase",
            "scissors",
            "teddybear",
            "hairdrier",
            "toothbrush"
        ]
      }
    ]
  }

Good idea with the labels file, will try this next, totally forgot about it!

brmarkus commented 1 month ago

Hmm, no luck, result is the same when using a labels file in the command line, metadata-json file is empty, no bounding boxes get drawn around persons.

The log-messages are the same:

$ GST_DEBUG=gva*:5 gst-launch-1.0 filesrc location=$VIDEO_EXAMPLE ! decodebin ! \
gvadetect model=/home/dlstreamer/models/public/yolov8s/FP16/yolov8s.xml \
model-proc=/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json device=CPU \
pre-process-backend=ie labels-file=/home/dlstreamer/models/coco_80cl.txt \
inference-region=roi-list object-class=person ! gvametaconvert ! \
gvametapublish file-path=/home/dlstreamer/models/meta.json ! queue ! gvawatermark ! \
videoconvertscale ! gvafpscounter ! autovideosink sync=false

...
0:00:00.064245799   121 0x5d8982492cd0 INFO      gva_base_inference gva_base_inference.cpp:813:gva_base_inference_update_object_classes:<gvadetect0> info: object classes update was not completed
0:00:00.064249489   121 0x5d8982492cd0 INFO      gva_base_inference gva_base_inference.cpp:813:gva_base_inference_update_object_classes:<gvadetect0> info: empty inference instance: retry will be performed once instance will be acquired
...
0:00:00.109036928   121 0x5d8982492cd0 INFO      gva_base_inference gva_base_inference.cpp:868:gva_base_inference_start:<gvadetect0> gvadetect0 inference parameters:
 -- Model: /home/dlstreamer/models/public/yolov8s/FP16/yolov8s.xml
 -- Model proc: /opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json
 -- Device: CPU
 -- Inference interval: 1
 -- Reshape: false
 -- Batch size: 0
 -- Reshape width: 0
 -- Reshape height: 0
 -- No block: false
 -- Num of requests: 0
 -- Model instance ID: (null)
 -- CPU streams: 0
 -- GPU streams: 0
 -- IE config:
 -- Allocator name: (null)
 -- Preprocessing type: ie
 -- Object class: person
 -- Labels: /home/dlstreamer/models/coco_80cl.txt

Is there a different way of filtering object detection (only) with a given object-class-label?

Is inference-region=roi-list supposed to be used for the first gvadetect (no prior plugin providing a ROI-list) to filter for object-classes...?

pmalatyn commented 1 month ago

@brmarkus yeah you are right, try to use https://dlstreamer.github.io/elements/gvaattachroi.html to generate list of roi (sample: https://github.com/dlstreamer/dlstreamer/tree/master/samples/gstreamer/gst_launch/gvaattachroi) and then rest of the pipeline

brmarkus commented 1 month ago

Inspired by the sample script from your link for gvaattachroi I first used this command line:

gst-launch-1.0 filesrc location=/home/dlstreamer/models/person-bicycle-car-detection.mp4 ! decodebin ! gvaattachroi roi=0,0,768,432 ! gvadetect inference-region=1 model=/home/dlstreamer/models//public/yolov8s/FP32/yolov8s.xml device=CPU pre-process-backend=ie ! queue ! gvawatermark ! gvametaconvert add-tensor-data=true ! gvametapublish file-format=json-lines file-path=output.json ! videoconvert ! gvafpscounter ! autovideosink sync=false

The whole video frame now gets a red frame (because I used the video resulion of 768x432). All objects are detected and bounding-boxes with the proper label gets displayed - all objects.

When adding object-class=person in order to filter for "persons", then still a red border around the video frame. But no object gets detected, no bounding box. The JSON file now counts only the one ROI spanning the whole video frame. Only lines like this:

{"objects":[{"detection":{"bounding_box":{"x_max":1.0,"x_min":0.0,"y_max":1.0,"y_min":0.0},"label":""},"h":432,"region_id":1520,"roi_type":"","tensors":[{"layout":"ANY","name":"detection","precision":"UNSPECIFIED"}],"w":768,"x":0,"y":0}],"resolution":{"height":432,"width":768},"timestamp":49166666666}
{"objects":[{"detection":{"bounding_box":{"x_max":1.0,"x_min":0.0,"y_max":1.0,"y_min":0.0},"label":""},"h":432,"region_id":1522,"roi_type":"","tensors":[{"layout":"ANY","name":"detection","precision":"UNSPECIFIED"}],"w":768,"x":0,"y":0}],"resolution":{"height":432,"width":768},"timestamp":49250000000}
{"objects":[{"detection":{"bounding_box":{"x_max":1.0,"x_min":0.0,"y_max":1.0,"y_min":0.0},"label":""},"h":432,"region_id":1524,"roi_type":"","tensors":[{"layout":"ANY","name":"detection","precision":"UNSPECIFIED"}],"w":768,"x":0,"y":0}],"resolution":{"height":432,"width":768},"timestamp":49333333333}

gst-launch-1.0 filesrc location=/home/dlstreamer/models/person-bicycle-car-detection.mp4 ! decodebin ! gvaattachroi roi=0,0,768,432 ! gvadetect inference-region=1 model=/home/dlstreamer/models//public/yolov8s/FP32/yolov8s.xml device=CPU pre-process-backend=ie object-class=person ! queue ! gvawatermark ! gvametaconvert add-tensor-data=true ! gvametapublish file-format=json-lines file-path=output.json ! videoconvert ! gvafpscounter ! ximagesink sync=false

Has the meaning of object-class changed in the meantime?

object-class : Filter for Region of Interest class label on this element input flags: readable, writable String. Default: ""

pmalatyn commented 1 month ago

where do you have definition of object-class now? I do not see model-proc or labels definition - but I might be blind :)

antoniomtz commented 1 month ago

I tried using gvaattachroi before gvadetect and I don't get any detections.

This query works fine without object-class, I get detections only within the ROI:

gst-launch-1.0 v4l2src device=/dev/video0 ! decodebin ! gvaattachroi roi=0,0,800,800 ! gvadetect model=/home/dlstreamer/intel/dl_streamer/models/public/yolov8s/FP32/yolov8s.xml model-proc=/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json inference-region=1 device=CPU pre-process-backend=ie ! queue ! gvawatermark ! videoconvertscale ! gvafpscounter ! autovideosink sync=false

When adding object-class=person I don't get any detections inside or outside the ROI:

gst-launch-1.0 v4l2src device=/dev/video0 ! decodebin ! gvaattachroi roi=0,0,800,800 ! gvadetect model=/home/dlstreamer/intel/dl_streamer/models/public/yolov8s/FP32/yolov8s.xml model-proc=/opt/intel/dlstreamer/samples/gstreamer/model_proc/public/yolo-v8.json inference-region=1 object-class=person device=CPU pre-process-backend=ie ! queue ! gvawatermark ! videoconvertscale ! gvafpscounter ! autovideosink sync=false

antoniomtz commented 1 month ago

@pmalatyn @brmarkus any updates on this?

brmarkus commented 1 month ago

Not from my side, sorry.

pmalatyn commented 1 month ago

@antoniomtz working on it, should have some conclusion by the end of the day

pmalatyn commented 1 month ago

this is working pipeline with object-class usage, based on your original one:

gst-launch-1.0 filesrc location=video-examples/People_On_The_Street.mp4 ! qtdemux ! vah264dec ! vapostproc ! video/x-raw(memory:VAMemory) ! gvaattachroi mode=1 file-path=roi_list.json ! gvadetect model=/mnt/dlstreamer/dest-dir/dlstreamer3/2022.1.43a3/public/yolov8s/FP32/yolov8s.xml model-proc=samples/gstreamer/model_proc/public/yolo-v8.json inference-region=1 device=CPU pre-process-backend=ie object-class=person ! queue ! gvawatermark ! videoconvertscale ! gvafpscounter ! queue ! gvawatermark ! videoconvertscale ! gvafpscounter ! vah264enc ! h264parse ! mp4mux ! filesink location=Output.mp4

Few remarks here: 1) The object-class parameter for gvadetect element allows to filter its input ROIs to select only these which have label specified in the parameter. Only these ROIs are taken to further inference/detection, the rest is skipped. 2) To have list of those ROIs we need to add in a pipeline gvaatachroi element with json file which defines ROIs (see file content below) before gvadetect with inference-region=roi-list and object-class defined 3) In our example it defines 2 ROIs covering parts of a full video - car and person: [ { "objects": [ { "detection": { "label": "person" }, "x": 480, "y": 0, "w": 1440, "h": 1080 }, { "detection": { "label": "car" }, "x": 0, "y": 0, "w": 480, "h": 1080 } ] } ] 4) Then these 2 ROIs (labeled with person and car respectively) are passed to gvadetect with inference-region=roi-list and object-class=person. 5) Because object-class in gvadetect is set to person, detection is done only on the part with person label defined in gvaatachroi, but please note that all objects are detected in that ROI, plz see example below:

example

brmarkus commented 1 month ago

Hmm, not sure I understand the intention behind the object-class=person...

Wth

object-class : Filter for Region of Interest class label on this element input flags: readable, writable String. Default: ""

I always read it as "use object-class to let gvadetect (or gvaclassify) output only those ROIs of the named classes, e.g. only get detections (or classifications) of persons".

pmalatyn commented 1 month ago

we will update the description to be more meaningful but it is working as designed