Perform inference on an opencv image loaded from memory

M3nxudo commented 7 months ago

💡 Your Question

Is it possible to perform inference on an image already in memory (through Opencv videocapture)? I have tried to do so but get no detections when trying to perform inference in the following way:

Defining a class that has the model info and a method for performing the detection and returns the predictions

class objectNAS():
def __init__(self, confThreshold):
    self.conf = confThreshold
    model_name = "yolo_nas_l"
    self.model =  super_gradients.training.models.get(model_name, pretrained_weights="coco").cuda()

def detect(self,input_image): 
    model_prediction = self.model.predict(input_image, iou=0.5, conf=self.conf)
    prediction = model_prediction.prediction
    print ('Debugging prediction')
    print (prediction.bboxes_xyxy[0])
    return prediction

I create an object of the aforementioned class and then try to perform inference with it inside of an OpenCV Videocapture loop

self.capVideo = cv2.VideoCapture(self.videofile[0])
        count = 0
        while self.capVideo.isOpened():
            ret, image = self.capVideo.read()
            if (ret):
                image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
                height, width, channel = image.shape
                step = channel * width
                result = objectNAS.detect(image)

In doing so and debugging the code i get an error in the line: print (prediction.bboxes_xyxy[0]) with the following text: File "D:\source\repos\libMuinen_UI\nas_interface.py", line 26, in detect print (prediction.bboxes_xyxy[0]) IndexError: index 0 is out of bounds for axis 0 with size 0

Which if I understand correctly means that the prediction results are empty and can't access them but from the input image I know i should be getting at least some detections. Would love to know if i'm using incorrectly the predict method or any tips to make my code work.

Versions

No response

BloodAxe commented 7 months ago

The model_prediction = self.model.predict(input_image, iou=0.5, conf=self.conf) line looks legit. You can pass your as RGB (not BGR as OpenCV reads) image as numpy array to predict(). It could be you are simply getting no detections (Maybe your confThreshold is too high?). So if there are no boxes you would get an index error at bboxes_xyxy[0] which is expected.

A somewhat related issue where you can find a code snippet of printing all detections: https://github.com/Deci-AI/super-gradients/issues/1818

M3nxudo commented 7 months ago

I double checked that i'm feeding the correct image type, tested with lower confidence (went from 0.7 down to 0.5), and even switched the loaded model to "yolo_nas_m" to make sure it wasn't the model itself and haven't been able to get any inferences working. I'm checking the size of prediction.bboxes_xyxy before accesing it so the program doesn't crush but every one of those is empty. So still not sure what i'm doing wrong. Additionally, checked issue #1818 as you suggested but that is working with image files not from images in memory.

M3nxudo commented 7 months ago

Update: I've seemed to narrow down the rogue line of code that is causing all my headaches.

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")

If i then do the inference and try to access the predictions, everything works as expected.

If instead i try to instantiate the model with gpu acceleration (with the following modification)

model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco").cuda()

This is when I always get empty predictions.

Additional info:

Cuda version 11.8 (also pip installed the matching pytorch with acceleration on my virtual environment)
Super-gradients version 3.6.

Any help on this matter would be appreciated @BloodAxe

BloodAxe commented 7 months ago

I don't see how it can be happening. Please double-check everything on your end. If you put this code to Colab with GPU and run you will get predictions as expected.

import cv2
import super_gradients

model_name = "yolo_nas_l"
model = super_gradients.training.models.get(model_name, pretrained_weights="coco").cuda()

image = cv2.imread(WHATEVER IMAGE)
image = image[:, :, ::-1]

model.predict(image).show()

Please share additional details what GPU you have and OS/python version you are using. If possible, provide a minimal yet complete code that reproduce your issue.

M3nxudo commented 7 months ago

I'm providing some code that opens a videocapture from the webcam, extracts individual frames and tries to perform inference on both the CPU and GPU for comparison:

from super_gradients.training import models
from super_gradients.common.object_names import Models
import cv2

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

modelcpu = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")
modelgpu = models.get(Models.YOLO_NAS_L, pretrained_weights="coco").cuda()
cv2.namedWindow("CPU inference")
cv2.namedWindow("GPU inference")
cv2.moveWindow("CPU inference", 0, 20)
cv2.moveWindow("GPU inference", 700, 20)

# Acquisition loop
while True:
    success, img = cap.read()
    if not success:
        break
    output_image_cpu = img.copy()
    output_image_gpu = img.copy()
    resultscpu = modelcpu.predict(img)
    resultsgpu = modelgpu.predict(img)
    # CPU results
    boxes_cpu = resultscpu.prediction.bboxes_xyxy
    label_names_cpu = resultscpu.class_names
    labels_cpu = resultscpu.prediction.labels
    confidence_cpu = resultscpu.prediction.confidence
    # GPU results
    boxes_gpu = resultsgpu.prediction.bboxes_xyxy
    label_names_gpu = resultsgpu.class_names
    labels_gpu = resultsgpu.prediction.labels
    confidence_gpu = resultsgpu.prediction.confidence

    # CPU result filtering loop
    if labels_cpu.size < 1:
        text = "No detections on CPU"
        print(text)
        output_image_cpu = cv2.putText(output_image_cpu, text, (260, 20), cv2.FONT_HERSHEY_SIMPLEX,
                                       0.5,(0, 0, 255), 2)
    else:
        count = -1
        for lab in labels_cpu:
            count += 1
            # Extract info
            local_label = label_names_gpu[labels_cpu[count]]
            local_conf = confidence_cpu[count]
            label = f"{local_label} ({local_conf:.2f})"
            x1 = boxes_cpu[count, 0]
            y1 = boxes_cpu[count, 1]
            x2 = boxes_cpu[count, 2]
            y2 = boxes_cpu[count, 3]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)  # convert to int values
            # Paint bboxes
            output_image_cpu = cv2.rectangle(output_image_cpu, (x1, y1), (x2, y2), (255, 0, 0), 2)
            output_image_cpu = cv2.putText(output_image_cpu, label, (x1 - 10, y1 - 10),cv2.FONT_HERSHEY_SIMPLEX,
                                         0.5,(255, 0, 0), 2)

    # GPU result filtering loop
    if labels_gpu.size < 1:
        text = "No detections on GPU"
        print(text)
        output_image_gpu = cv2.putText(output_image_gpu, text, (260, 20),cv2.FONT_HERSHEY_SIMPLEX,
                                       0.5,(0, 0, 255), 2)
    else:
        count = -1
        for lab in labels_gpu:
            count += 1
            # Extract info
            local_label = label_names_gpu[labels_gpu[count]]
            local_conf = confidence_gpu[count]
            label = f"{local_label} ({local_conf:.2f})"
            x1 = boxes_gpu[count, 0]
            y1 = boxes_gpu[count, 1]
            x2 = boxes_gpu[count, 2]
            y2 = boxes_gpu[count, 3]
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)  # convert to int values
            # Paint bboxes
            output_image_gpu = cv2.rectangle(output_image_gpu, (x1, y1), (x2, y2), (255, 0, 0), 2)
            output_image_gpu = cv2.putText(output_image_gpu, label, (x1 - 10, y1 - 10),cv2.FONT_HERSHEY_SIMPLEX,
                                           0.5,(255, 0, 0), 2)

    cv2.imshow("CPU inference", output_image_cpu)
    cv2.imshow("GPU inference", output_image_gpu)
    if cv2.waitKey(1) == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

Additional hardware information:

Package information (using a conda environment): entorno

Hope that is enough to reproduce the behaviour @BloodAxe

EDIT: additional software info OS: Windows 10 Enterprise LTSC Python version: 3.10.13

BloodAxe commented 7 months ago

Thanks for the detailed snippet to reproduce. Unfortunately I was not able to reproduce the issue yet. On my 4090 it works fine and predictions on CPU & GPU are identical. I will try later on 1070 which I happen to have and will let you know how it goes.

Update: Code works well on both 4090 and 1070 🤷‍♂️

BloodAxe commented 7 months ago

Ok, probably this is where it all coming from https://github.com/pytorch/pytorch/issues/58123

On our end we will introduce an fp16 argument that you can use to disable fp16 inference mode.

BloodAxe commented 7 months ago

We have a workaround PR to disable mixed precision used in model.predict() which hopefully should fix your issue. This will land in a next release of SG. But if you are really eager to try it out you can install development version of SG from feature branch using this command: pip install -U git+https://github.com/Deci-AI/super-gradients@feature/SG-000-introduce-fp16-flag-to-predict

M3nxudo commented 7 months ago

By changing line 24 of the snippet to: resultsgpu = modelgpu.predict(img, fp16=False) Everything is working as expected now (detections on gpu are finally up and running) Thanks for the solution, awesome to see the Deci team continuously improving the super-gradients repo.

Deci-AI / super-gradients

Perform inference on an opencv image loaded from memory #1834

💡 Your Question

Versions