[Question]Using multiple IP camera stream as source for detection

mriamnobody commented 1 year ago

I have recently found this excellent repository through an article on the internet. I tried my hands to implement what I need I need but have failed. I have these issues:

1) How can I use multiple IP camera streams as a source? 2) Is there a way to use a pre-trained model in .onnx format?

Please forgive me for my noobiness.

Edit:

One more thing, the command to convert the pre-trained model to .onnx format uses word dummy input

torch.onnx.export(model, dummy_input, "yolo_nas_m.onnx")

what is it?

Thank you.

dagshub[bot] commented 1 year ago

Join the discussion on DagsHub!

Louis-Dupont commented 1 year ago

Hi @mriamnobody dummy_input in the torch.onnx.export command is a tensor of the same size as your model's expected input. It's used to trace the operations performed on the input, which is used for constructing the ONNX graph.

This dummy input doesn't contain any meaningful data; it's used only to mimic the shape and data type of the actual inputs your model expects.

In case of yolo_nas, it is (3, 640, 640) images:

import torch
from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

model.prep_model_for_conversion(input_size=(640, 640))
onnx_input = torch.zeros((3, 640, 640))

torch.onnx.export(model, onnx_input, f="yolo_nas_l.onnx")

Concerning your first question, we only support single stream prediction out of the box, but you can write your own script to support it:

import cv2
from super_gradients.common.object_names import Models
from super_gradients.training import models

# Note that currently only YoloX, PPYoloE and YOLO-NAS are supported.
model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

# Here I am drawing on the same camera, but you can change.
cap1 = cv2.VideoCapture(cv2.CAP_ANY)
cap2 = cv2.VideoCapture(cv2.CAP_ANY)

while True:
    # Read frames from the cameras
    ret1, frame1 = cap1.read()
    ret2, frame2 = cap2.read()

    # Check if frames are successfully captured
    if not ret1 or not ret2:
        break

    # Running predict on the 2 sources at the same time will improve processing speed.
    predictions = model.predict([frame1, frame2])

    for i, predicted_frame in enumerate(predictions):
        cv2.imshow(f"Camera {i}: ", predicted_frame.draw())

    # Check for key press
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

# Release the video capture objects and close the windows
cap1.release()
cap2.release()
cv2.destroyAllWindows()

Louis-Dupont commented 1 year ago

@mriamnobody does that answer your question :)

mriamnobody commented 1 year ago

Hi @mriamnobody dummy_input in the torch.onnx.export command is a tensor of the same size as your model's expected input. It's used to trace the operations performed on the input, which is used for constructing the ONNX graph.

This dummy input doesn't contain any meaningful data; it's used only to mimic the shape and data type of the actual inputs your model expects.

In case of yolo_nas, it is (3, 640, 640) images:
import torch
from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

model.prep_model_for_conversion(input_size=(640, 640))
onnx_input = torch.zeros((3, 640, 640))

torch.onnx.export(model, onnx_input, f="yolo_nas_l.onnx")

Now it makes some sense to me. Thank you @Louis-Dupont for your guidance and help.

mriamnobody commented 1 year ago

Concerning your first question, we only support single stream prediction out of the box, but you can write your own script to support it:

import cv2
from super_gradients.common.object_names import Models
from super_gradients.training import models

# Note that currently only YoloX, PPYoloE and YOLO-NAS are supported.
model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

# Here I am drawing on the same camera, but you can change.
cap1 = cv2.VideoCapture(cv2.CAP_ANY)
cap2 = cv2.VideoCapture(cv2.CAP_ANY)

while True:
    # Read frames from the cameras
    ret1, frame1 = cap1.read()
    ret2, frame2 = cap2.read()

    # Check if frames are successfully captured
    if not ret1 or not ret2:
        break

    # Running predict on the 2 sources at the same time will improve processing speed.
    predictions = model.predict([frame1, frame2])

    for i, predicted_frame in enumerate(predictions):
        cv2.imshow(f"Camera {i}: ", predicted_frame.draw())

    # Check for key press
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

# Release the video capture objects and close the windows
cap1.release()
cap2.release()
cv2.destroyAllWindows()

@Louis-Dupont, Wow, the script is so intuitive, clean and compact yet powerful. Does the line predictions = model.predict([frame1, frame2]) perform detection/prediction concurrently on the two streams? In case if I want will the model be able to handle more than 2 Ip cameras concurrently? If multiple IP cameras are used will it affect the quality/precision/accuracy of detection/prediction?

Louis-Dupont commented 1 year ago

The model.predict runs by batch of up to 32 images at once, so it is faster to run model.predict([frame1, frame2]) than running sequentially model.predict([frame1]) and then model.predict([frame2]), especially on GPU. That being said, running the 2 predicts together means that the fps of both streams will be the same. The alternative would be to run 1 process per stream, in parallel. I think this would decrease fps overall, but it might be worth trying.

mriamnobody commented 1 year ago

@Louis-Dupont, The script ran successfully.

Hi @mriamnobody dummy_input in the torch.onnx.export command is a tensor of the same size as your model's expected input. It's used to trace the operations performed on the input, which is used for constructing the ONNX graph.

This dummy input doesn't contain any meaningful data; it's used only to mimic the shape and data type of the actual inputs your model expects.

In case of yolo_nas, it is (3, 640, 640) images:
import torch
from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

model.prep_model_for_conversion(input_size=(640, 640))
onnx_input = torch.zeros((3, 640, 640))

torch.onnx.export(model, onnx_input, f="yolo_nas_l.onnx")

For the following script, I got the error:

Traceback (most recent call last):
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/test.py", line 143, in <module>
    torch.onnx.export(model, onnx_input, f="yolo_nas_l.onnx")
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 1111, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 987, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 891, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/training/models/detection_models/customizable_detector.py", line 84, in forward
    x = self.backbone(x)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/modules/detection_modules.py", line 80, in forward
    x = getattr(self, layer)(x)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py", line 138, in forward
    return self.conv(x)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/modules/qarepvgg_block.py", line 182, in forward
    return self.se(self.nonlinearity(self.post_bn(self.rbr_reparam(inputs))))
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 138, in forward
    self._check_input_dim(input)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 410, in _check_input_dim
    raise ValueError("expected 4D input (got {}D input)".format(input.dim()))
ValueError: expected 4D input (got 3D input)

mriamnobody commented 1 year ago

The model.predict runs by batch of up to 32 images at once, so it is faster to run model.predict([frame1, frame2]) than running sequentially model.predict([frame1]) and then model.predict([frame2]), especially on GPU. That being said, running the 2 predicts together means that the fps of both streams will be the same. The alternative would be to run 1 process per stream, in parallel. I think this would decrease fps overall, but it might be worth trying.

Is the prediction on the frames done in realtime? I have observed that the prediction is lagging too much from the IP camera. For example If the time is 12:28:30 (HH:MM:SS) on IP camera then the detection frame displayed has time 12:21:11 (HH:MM:SS) roughly 7 minutes behind.

mriamnobody commented 1 year ago

@Louis-Dupont Is there a way to only look for humans in the frame for detection?

mriamnobody commented 1 year ago

@Louis-Dupont, The script ran successfully.

Hi @mriamnobody dummy_input in the torch.onnx.export command is a tensor of the same size as your model's expected input. It's used to trace the operations performed on the input, which is used for constructing the ONNX graph. This dummy input doesn't contain any meaningful data; it's used only to mimic the shape and data type of the actual inputs your model expects. In case of yolo_nas, it is (3, 640, 640) images:
import torch
from super_gradients.common.object_names import Models
from super_gradients.training import models

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

model.prep_model_for_conversion(input_size=(640, 640))
onnx_input = torch.zeros((3, 640, 640))

torch.onnx.export(model, onnx_input, f="yolo_nas_l.onnx")

For the following script, I got the error:

Traceback (most recent call last):
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/test.py", line 143, in <module>
    torch.onnx.export(model, onnx_input, f="yolo_nas_l.onnx")
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 504, in export
    _export(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 1529, in _export
    graph, params_dict, torch_out = _model_to_graph(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 1111, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 987, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/onnx/utils.py", line 891, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/jit/_trace.py", line 1184, in _get_trace_graph
    outs = ONNXTracedModule(f, strict, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/jit/_trace.py", line 127, in forward
    graph, out = torch._C._create_graph_by_tracing(
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/jit/_trace.py", line 118, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/training/models/detection_models/customizable_detector.py", line 84, in forward
    x = self.backbone(x)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/modules/detection_modules.py", line 80, in forward
    x = getattr(self, layer)(x)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/training/models/detection_models/yolo_nas/yolo_stages.py", line 138, in forward
    return self.conv(x)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/super_gradients/modules/qarepvgg_block.py", line 182, in forward
    return self.se(self.nonlinearity(self.post_bn(self.rbr_reparam(inputs))))
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1182, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 138, in forward
    self._check_input_dim(input)
  File "/mnt/c/Users/rosha/Downloads/Compressed/super-gradients-3.1.1/yolovnass/lib/python3.10/site-packages/torch/nn/modules/batchnorm.py", line 410, in _check_input_dim
    raise ValueError("expected 4D input (got {}D input)".format(input.dim()))
ValueError: expected 4D input (got 3D input)

I batch size was missing in onnx_input = torch.zeros((3, 640, 640)).

As you mentioned

The model.predict runs by batch of up to 32 images at once,

I used (32, 3, 640,640) and the script successfully generated .onnx file

mriamnobody commented 1 year ago

@Louis-Dupont , I have modified your script to add telegram alerts, error handling and logging. Please take a look. This also be useful if someone needs to implement telegram alerts. The only thing left to implement is to look for only humans/persons in the frame. Harpreet in Discord shared a link https://github.com/Deci-AI/super-gradients/issues/892 which has some related information, but it is too complex for me to understand and implement. The issue I have attached a link for also mentioned that Regarding filtering of classes - no, currently this is not supported. I'd be grateful if you can help and look into the issue.

import cv2
import time
import asyncio
import logging
from super_gradients.training import models
from super_gradients.common.object_names import Models
from telegram import Bot
from telegram.error import TelegramError

# Set up logging
logging.basicConfig(filename='app.log', 
                    filemode='w', 
                    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                    level=logging.INFO)
logger = logging.getLogger(__name__)

TOKEN = 'YOUR_BOT_TOKEN'
CHAT_ID = 'YOUR_CHAT_ID'

bot = Bot(token=TOKEN)

async def send_message(text):
    try:
        await bot.send_message(chat_id=CHAT_ID, text=text)
    except TelegramError as e:
        logger.error("Failed to send message through Telegram with error: %s", e)

# Create a loop to run the async function in
loop = asyncio.get_event_loop()
loop.run_until_complete(send_message("Detection Program Started"))

def release_resources():
    logger.info("Releasing video capture objects and closing windows")
    cap1.release()
    cap2.release()
    cap3.release()
    cap4.release()
    cap5.release()
    cv2.destroyAllWindows()

start_time = time.time()
logger.info("Script start time: %s", start_time)

try:
    logger.info("Starting to load model")
    model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco").cuda()
    logger.info("Detection Model loaded successfully")
    loop.run_until_complete(send_message("Detection Model loaded successfully"))
except Exception as e:
    logger.error("Detection Model loading failed with error: %s", e)
    loop.run_until_complete(send_message("Model loading failed with error: " + str(e)))

try:
    cap1 = cv2.VideoCapture('camstream1')
    cap2 = cv2.VideoCapture('camstream2')
    cap3 = cv2.VideoCapture('camstream3')
    cap4 = cv2.VideoCapture('camstream4')
    cap5 = cv2.VideoCapture('camstream5')
except Exception as e:
    logger.error("VideoCapture initialization failed with error: %s", e)
    loop.run_until_complete(send_message("VideoCapture initialization failed with error: " + str(e)))

while True:
    try:
        captures = [(cap1, 'cam_name1'), (cap2, 'cam_name2'), (cap3, 'cam_name3'), (cap4, 'cam_name4'), (cap5, 'cam_name5')]
        frames = []
        logger.info("Reading frames from cameras")

        for cap, camera_name in captures:
            ret, frame = cap.read()
            if not ret:
                loop.run_until_complete(send_message(f"Failed to read frames from camera {camera_name}"))
            else:
                frames.append(frame)

        logger.info("Predicting frames")
        predictions = model.predict(frames)

        for i, (predicted_frame, (_, camera_name)) in enumerate(zip(predictions, captures)):
            cv2.imshow(camera_name, predicted_frame.draw())

        if cv2.waitKey(1) & 0xFF == ord("q"):
            break
    except Exception as e:
        logger.error("An error occurred: %s", e)
        loop.run_until_complete(send_message("An error occurred: " + str(e)))
        time.sleep(5) # optional: wait before trying again
        continue

release_resources()

end_time = time.time()
logger.info("Script end time: %s", end_time)

execution_time = end_time - start_time
logger.info("Script executed in: %s seconds", execution_time)
print(f"Script executed in: {execution_time} seconds")
loop.run_until_complete(send_message(f"Script executed in: {execution_time} seconds"))

mriamnobody commented 1 year ago

I have also created and exported the model in .onnx format, but I'm not sure how to use it in the code.

mriamnobody commented 1 year ago

@Louis-Dupont @NatanBagrov

Louis-Dupont commented 1 year ago

Hi @mriamnobody , as Harpreet said, we currently don't support it. I invite you to have a look at our implementation of the draw() method:https://github.com/Deci-AI/super-gradients/blob/a30fa8fdc623533df785831f7457967066fb2ebe/src/super_gradients/training/models/prediction_results.py#L44-L85 All you need to do is to create your own def draw_human(predicted_frame: ImageDetectionPrediction) -> np.ndarray function, which would iterate over predicted_frame.prediction the same way we do just with a condition when predicted_frame.class_names[class_id] == "human" (make sure this is the exact name) Hoping this helps

mriamnobody commented 1 year ago

@Louis-Dupont. I created a small function, but now nothing is being detected (including humans):

    def draw_human(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> np.ndarray:
        """Draw the predicted bboxes on the image for humans only.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        :return:                Image with predicted bboxes. Note that this does not modify the original image.
        """
        image = self.image.copy()
        color_mapping = color_mapping or generate_color_mapping(len(self.class_names))

        for pred_i in range(len(self.prediction)):

            class_id = int(self.prediction.labels[pred_i])
            class_name = self.class_names[class_id]

            # Skip if detected class is not 'human'
            if class_name.lower() != 'human':
                continue

            score = "" if not show_confidence else str(round(self.prediction.confidence[pred_i], 2))

            image = draw_bbox(
                image=image,
                title=f"{class_name} {score}",
                color=color_mapping[class_id],
                box_thickness=box_thickness,
                x1=int(self.prediction.bboxes_xyxy[pred_i, 0]),
                y1=int(self.prediction.bboxes_xyxy[pred_i, 1]),
                x2=int(self.prediction.bboxes_xyxy[pred_i, 2]),
                y2=int(self.prediction.bboxes_xyxy[pred_i, 3]),
            )

        return image

mriamnobody commented 1 year ago

The edited prediction_results.py is now:

import os
from abc import ABC, abstractmethod
from typing import List, Optional, Tuple, Iterator
from dataclasses import dataclass

import numpy as np

from super_gradients.training.models.predictions import Prediction, DetectionPrediction
from super_gradients.training.utils.media.video import show_video_from_frames, save_video
from super_gradients.training.utils.media.image import show_image, save_image
from super_gradients.training.utils.visualization.utils import generate_color_mapping
from super_gradients.training.utils.visualization.detection import draw_bbox

@dataclass
class ImagePrediction(ABC):
    """Object wrapping an image and a model's prediction.

    :attr image:        Input image
    :attr predictions:  Predictions of the model
    :attr class_names:  List of the class names to predict
    """

    image: np.ndarray
    prediction: Prediction
    class_names: List[str]

    @abstractmethod
    def draw(self, *args, **kwargs) -> np.ndarray:
        """Draw the predictions on the image."""
        pass

    @abstractmethod
    def draw_human(self, *args, **kwargs) -> np.ndarray:
        """Draw the predictions on the image."""
        pass

    @abstractmethod
    def show(self, *args, **kwargs) -> None:
        """Display the predictions on the image."""
        pass

    @abstractmethod
    def save(self, *args, **kwargs) -> None:
        """Save the predictions on the image."""
        pass

@dataclass
class ImageDetectionPrediction(ImagePrediction):
    """Object wrapping an image and a detection model's prediction.

    :attr image:        Input image
    :attr predictions:  Predictions of the model
    :attr class_names:  List of the class names to predict
    """

    image: np.ndarray
    prediction: DetectionPrediction
    class_names: List[str]

    def draw(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> np.ndarray:
        """Draw the predicted bboxes on the image.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        :return:                Image with predicted bboxes. Note that this does not modify the original image.
        """
        image = self.image.copy()
        color_mapping = color_mapping or generate_color_mapping(len(self.class_names))

        for pred_i in range(len(self.prediction)):

            class_id = int(self.prediction.labels[pred_i])

            if self.class_names[class_id] != "human":
                continue

            score = "" if not show_confidence else str(round(self.prediction.confidence[pred_i], 2))

            image = draw_bbox(
                image=image,
                title=f"{self.class_names[class_id]} {score}",
                color=color_mapping[class_id],
                box_thickness=box_thickness,
                x1=int(self.prediction.bboxes_xyxy[pred_i, 0]),
                y1=int(self.prediction.bboxes_xyxy[pred_i, 1]),
                x2=int(self.prediction.bboxes_xyxy[pred_i, 2]),
                y2=int(self.prediction.bboxes_xyxy[pred_i, 3]),
            )

        return image

    def draw_human(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> np.ndarray:
        """Draw the predicted bboxes on the image for humans only.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        :return:                Image with predicted bboxes. Note that this does not modify the original image.
        """
        image = self.image.copy()
        color_mapping = color_mapping or generate_color_mapping(len(self.class_names))

        for pred_i in range(len(self.prediction)):

            class_id = int(self.prediction.labels[pred_i])
            class_name = self.class_names[class_id]

            # Skip if detected class is not 'human'
            if class_name.lower() != 'human':
                continue

            score = "" if not show_confidence else str(round(self.prediction.confidence[pred_i], 2))

            image = draw_bbox(
                image=image,
                title=f"{class_name} {score}",
                color=color_mapping[class_id],
                box_thickness=box_thickness,
                x1=int(self.prediction.bboxes_xyxy[pred_i, 0]),
                y1=int(self.prediction.bboxes_xyxy[pred_i, 1]),
                x2=int(self.prediction.bboxes_xyxy[pred_i, 2]),
                y2=int(self.prediction.bboxes_xyxy[pred_i, 3]),
            )

        return image

    def show(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> None:
        """Display the image with predicted bboxes.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        """
        image = self.draw(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping)
        show_image(image)

    def save(self, output_path: str, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> None:
        """Save the predicted bboxes on the images.

        :param output_path:     Path to the output video file.
        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        """
        image = self.draw(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping)
        save_image(image=image, path=output_path)

@dataclass
class ImagesPredictions(ABC):
    """Object wrapping the list of image predictions.

    :attr _images_prediction_lst: List of results of the run
    """

    _images_prediction_lst: List[ImagePrediction]

    def __len__(self) -> int:
        return len(self._images_prediction_lst)

    def __getitem__(self, index: int) -> ImagePrediction:
        return self._images_prediction_lst[index]

    def __iter__(self) -> Iterator[ImagePrediction]:
        return iter(self._images_prediction_lst)

    @abstractmethod
    def show(self, *args, **kwargs) -> None:
        """Display the predictions on the images."""
        pass

    @abstractmethod
    def save(self, *args, **kwargs) -> None:
        """Save the predictions on the images."""
        pass

@dataclass
class VideoPredictions(ImagesPredictions, ABC):
    """Object wrapping the list of image predictions as a Video.

    :attr _images_prediction_lst:   List of results of the run
    :att fps:                       Frames per second of the video
    """

    _images_prediction_lst: List[ImagePrediction]
    fps: float

    @abstractmethod
    def show(self, *args, **kwargs) -> None:
        """Display the predictions on the video."""
        pass

    @abstractmethod
    def save(self, *args, **kwargs) -> None:
        """Save the predictions on the video."""
        pass

@dataclass
class ImagesDetectionPrediction(ImagesPredictions):
    """Object wrapping the list of image detection predictions.

    :attr _images_prediction_lst:  List of the predictions results
    """

    _images_prediction_lst: List[ImageDetectionPrediction]

    def show(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> None:
        """Display the predicted bboxes on the images.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        """
        for prediction in self._images_prediction_lst:
            prediction.show(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping)

    def save(
        self, output_folder: str, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None
    ) -> None:
        """Save the predicted bboxes on the images.

        :param output_folder:     Folder path, where the images will be saved.
        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        """
        if output_folder:
            os.makedirs(output_folder, exist_ok=True)

        for i, prediction in enumerate(self._images_prediction_lst):
            image_output_path = os.path.join(output_folder, f"pred_{i}.jpg")
            prediction.save(output_path=image_output_path, box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping)

@dataclass
class VideoDetectionPrediction(VideoPredictions):
    """Object wrapping the list of image detection predictions as a Video.

    :attr _images_prediction_lst:   List of the predictions results
    :att fps:                       Frames per second of the video
    """

    _images_prediction_lst: List[ImageDetectionPrediction]
    fps: int

    def draw(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> List[np.ndarray]:
        """Draw the predicted bboxes on the images.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        :return:                List of images with predicted bboxes. Note that this does not modify the original image.
        """
        frames_with_bbox = [
            result.draw(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping) for result in self._images_prediction_lst
        ]
        return frames_with_bbox

    def draw_human(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> List[np.ndarray]:
        """Draw the predicted bboxes on the images.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        :return:                List of images with predicted bboxes. Note that this does not modify the original image.
        """
        frames_with_bbox = [
            result.draw(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping) for result in self._images_prediction_lst
        ]
        return frames_with_bbox

    def show(self, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> None:
        """Display the predicted bboxes on the images.

        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        """
        frames = self.draw(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping)
        show_video_from_frames(window_name="Detection", frames=frames, fps=self.fps)

    def save(self, output_path: str, box_thickness: int = 2, show_confidence: bool = True, color_mapping: Optional[List[Tuple[int, int, int]]] = None) -> None:
        """Save the predicted bboxes on the images.

        :param output_path:     Path to the output video file.
        :param box_thickness:   Thickness of bounding boxes.
        :param show_confidence: Whether to show confidence scores on the image.
        :param color_mapping:   List of tuples representing the colors for each class.
                                Default is None, which generates a default color mapping based on the number of class names.
        """
        frames = self.draw(box_thickness=box_thickness, show_confidence=show_confidence, color_mapping=color_mapping)
        save_video(output_path=output_path, frames=frames, fps=self.fps)

mriamnobody commented 1 year ago

@Louis-Dupont

Louis-Dupont commented 1 year ago

Isn't the class name person instead of human ? DId you try to debug the function to see the predicted classes ? If you debug the function you will be able to see which classes are detected and it could help you understand if the issue is that human are not detected, or that human is not a class_name and that it is instead person.

Louis-Dupont commented 1 year ago

Closing due to inactivity

Deci-AI / super-gradients

[Question]Using multiple IP camera stream as source for detection #997