Deci-AI / super-gradients

Easily train or fine-tune SOTA computer vision models with one open source training library. The home of Yolo-NAS.
https://www.supergradients.com
Apache License 2.0
4.59k stars 510 forks source link

predict using batch_size #1616

Closed khalilxg closed 1 year ago

khalilxg commented 1 year ago

🐛 Describe the bug

im using super-gradients 3.3.0rc27211 my laptop is rtx30602ti i7 12th g 32G ram

i want to predict a one hour traffic video dimension 840*440 so im gonne predict it with batch size

but it seems anyway im running out of memory like the batch size param is not working !

media_predictions = model.predict(MEDIA_PATH, batch_size=32)

// full code import os import cv2 import numpy as np import math from sort import Sort import os import cv2 import numpy as np import math from sort import Sort import torch from super_gradients.common.object_names import Models from super_gradients.training import models import cv2 import numpy as np import math from sort import *

Initialize class counters

class_counts_ns = [0] 8 # Counters for north to south direction class_counts_sn = [0] 8 # Counters for south to north direction tracked_vehicles = {} # To keep track of tracked vehicle IDs and their positions

Specify the path to your input video folder and output folder for text files

input_folder = 'input' output_folder = 'output'

Create the output folder if it doesn't exist

if not os.path.exists(output_folder): os.makedirs(output_folder)

Load the YOLO model

...

model = models.get(Models.YOLO_NAS_S, pretrained_weights='coco', checkpoint_path='/home/ubuntu/Downloads/convids/Dataset/test2/ckpt_best.pth', num_classes=8, batch_size=32) model = model.to("cuda" if torch.cuda.is_available() else "cpu")

Define class names

class_names = ["motorcycle", "car", "bus", "small truck", "big truck", "CM1"]

Confidence threshold for detections

confidence_threshold = 0.65

Font settings for displaying class counts

font = cv2.FONT_HERSHEY_SIMPLEX font_scale = 0.3 # Reduce font size to 0.3 font_thickness = 1 # Reduce font thickness to 1 font_color = (255, 255, 255) # White text color

Define table position and row height

row_height = 10 # Height of each row in the table

Initialize SORT tracker

tracker = Sort(max_age=1, min_hits=1, iou_threshold=0.01)

Create and open the results.txt file for appending

results_file_path = os.path.join(output_folder, 'results.txt') results_file = open(results_file_path, 'a')

Get the list of video files

video_files = sorted([f for f in os.listdir(input_folder) if f.endswith('.mp4')]) BATCH_SIZE=32

Process each video in the input folder in order

for video_file in video_files: MEDIA_PATH = os.path.join(input_folder, video_file)

# Initialize VideoCapture for video width and height
cap = cv2.VideoCapture(MEDIA_PATH)
fourcc = int(cap.get(cv2.CAP_PROP_FOURCC))
fps = int(cap.get(cv2.CAP_PROP_FPS))
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))

# Initialize VideoWriter
out = cv2.VideoWriter(os.path.join(output_folder, f'output_{video_file}'), fourcc, fps, (frame_width, frame_height))

# Process the video using SuperGradients library
media_predictions = model.predict(MEDIA_PATH, batch_size=32)
for frame_index, frame_prediction in enumerate(media_predictions):
        frame = frame_prediction.image  # Get the frame
        labels = frame_prediction.prediction.labels
        confidences = frame_prediction.prediction.confidence
        bboxes = frame_prediction.prediction.bboxes_xyxy

        # You can perform frame-specific operations here as needed

        # Process the frame data as you did in your previous code

        detections = np.empty((0, 5))

        for (bbox_xyxy, confidence, cls) in zip(bboxes, confidences, labels):
            if confidence >= confidence_threshold:
                bbox = np.array(bbox_xyxy)
                x1, y1, x2, y2 = bbox[0], bbox[1], bbox[2], bbox[3]
                x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
                class_id = int(cls)
                class_name = class_names[class_id]
                conf = math.ceil((confidence * 100)) / 100
                currentArray = np.array([x1, y1, x2, y2, conf])
                detections = np.vstack((detections, currentArray))

        # Batched tracking
        resultsTracker = tracker.update(detections)

        for result in resultsTracker:
            x1, y1, x2, y2, id = result
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            cx, cy = int((x1 + x2) / 2), int((y1 + y2) / 2)

            # Check if the vehicle is already tracked
            if id not in tracked_vehicles:
                tracked_vehicles[id] = {"prev_y": cy, "direction": None}

            # Check the direction based on vertical movement
            prev_y = tracked_vehicles[id]["prev_y"]
            direction = tracked_vehicles[id]["direction"]

            if cy < prev_y - 1:  # Vehicle is moving from south to north (upward)
                if direction != "SN":
                    class_counts_sn[class_id] += 1
                    direction = "SN"
            elif cy > prev_y + 1:  # Vehicle is moving from north to south (downward)
                if direction != "NS":
                    class_counts_ns[class_id] += 1
                    direction = "NS"

            tracked_vehicles[id]["prev_y"] = cy
            tracked_vehicles[id]["direction"] = direction

        # Draw bounding boxes and labels on the frame
        for (x1, y1, x2, y2, id) in resultsTracker:
            x1, y1, x2, y2 = int(x1), int(y1), int(x2), int(y2)
            cv2.rectangle(frame, (x1, y1), (x2, y2), (85, 54, 255), 3)
            label = f'{int(id)}:{class_names[class_id]}'
            t_size = cv2.getTextSize(label, font, font_scale, font_thickness)[0]
            c2 = x1 + t_size[0], y1 - t_size[1] - 3
            cv2.rectangle(frame, (x1, y1), c2, (255, 0, 255), -1, cv2.LINE_AA)
            cv2.putText(frame, label, (x1, y1 - 2), font, font_scale, font_color, font_thickness, cv2.LINE_AA)

        # Create a black background for the table
        table_height = (len(class_names) + 5) * row_height + 10  # +2 for the header row and "Total" row, and +1 for the blank row
        table_width = 120  # Adjust the width as needed, reduced from 350
        table_x = frame_width - table_width  # Position at the top-right corner
        table = np.zeros((table_height, table_width, 3), dtype=np.uint8)

        # Calculate the width of the first column
        first_column_width = 30  # Adjust the width as needed, reduced from 150

        # Adjust the horizontal spacing between columns
        column_spacing = 30  # Adjust the spacing as needed, reduced from 50

        # Add headers to the table
        cv2.putText(table, "Class", (0, 20), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)
        cv2.putText(table, "SENS 2", (first_column_width + column_spacing, 20), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)
        cv2.putText(table, "SENS 1", (first_column_width + 2 * column_spacing, 20), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)

        # Draw class counts on the black table
        for class_id, class_name in enumerate(class_names):
            class_count_ns = class_counts_ns[class_id]
            class_count_sn = class_counts_sn[class_id]

            # Calculate the coordinates for the text
            text_y = 40 + class_id * row_height

            # Use the calculated coordinates to draw the text on the black table
            cv2.putText(table, class_name, (0, text_y), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)
            cv2.putText(table, str(class_count_ns), (first_column_width + column_spacing, text_y), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)
            cv2.putText(table, str(class_count_sn), (first_column_width + 2 * column_spacing, text_y), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)

        # Calculate the total counts of NS and SN
        total_ns = sum(class_counts_ns)
        total_sn = sum(class_counts_sn)

        # Add a blank row
        blank_row_y = 40 + len(class_names) * row_height  # Position the blank row under all the class rows
        cv2.putText(table, "", (0, blank_row_y), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)

        # Add the "Total" row
        total_row_y = blank_row_y + row_height  # Position the "Total" row under the blank row
        cv2.putText(table, "Total", (0, total_row_y), font, font_scale, (255, 255, 255), font_thickness, cv2.LINE_AA)
        cv2.putText(table, str(total_ns), (first_column_width + column_spacing, total_row_y), font, 0.2, (255, 255, 255), font_thickness, cv2.LINE_AA)
        cv2.putText(table, str(total_sn), (first_column_width + 2 * column_spacing, total_row_y), font, 0.2, (255, 255, 255), font_thickness, cv2.LINE_AA)

        # Add the black table to the frame
        frame[0:table_height, frame_width - table_width:] = table

        # Write the frame to the output video
        out.write(frame)

# Release the VideoWriter
out.release()

# Release the video capture
cap.release()

# Append lines of predictions to the results.txt file
results_file.write(f'{{"SENS": "SENS 2", "nom video": "{video_file}", "cars": {class_counts_ns[1]}, "motobikes": {class_counts_ns[0]}, "small_trucks": {class_counts_ns[3]}, "big_trucks": {class_counts_ns[4]}, "busses": {class_counts_ns[2]}, "constractions": {class_counts_ns[5]}}}\n')
results_file.write(f'{{"SENS": "SENS 1", "nom video": "{video_file}", "cars": {class_counts_sn[1]}, "motobikes": {class_counts_sn[0]}, "small_trucks": {class_counts_sn[3]}, "big_trucks": {class_counts_sn[4]}, "busses": {class_counts_sn[2]}, "constractions": {class_counts_sn[5]}}}\n')

# Reset class counters to zero after processing each video
class_counts_ns = [0] * 8
class_counts_sn = [0] * 8

Close the results file

results_file.close()

Close OpenCV windows

cv2.destroyAllWindows()

Versions

super-gradients 3.3.0rc27211

BloodAxe commented 1 year ago

Yes, it is a known limitation. We have it on our todo list. As for now I suggest you iterating over frames of your video sequence manually and running per-frame prediction and processing results according to your needs:

predict_pipeline = model._get_pipeline()
for frame in frames:
   predictions = predict_pipeline(frame)
   ... do something with predictions
khalilxg commented 1 year ago

@BloodAxe but in this way the pricessing of the video will take super too long, while we eliminated in this way the speeding mechanism of supergradients. This is a big problem

BloodAxe commented 1 year ago

predict() was not designed for processing long videos in the first place. It was not designed to be efficient for inference tasks either. As I said earlier this is known issue and suggested you a workaround for now. Yes, I'm this version you won't get batched inference speedup, but it won't crash. It may change in future but as of now this is how it is

khalilxg commented 1 year ago

@BloodAxe thank you for ur support. Ill try to manage to edit video quality etc cuz in that way it can be processed ( change codex to rawvideo( doesnt vompress videos but itll make them super large) + resize the video