CASIA-IVA-Lab / FastSAM

Fast Segment Anything
GNU Affero General Public License v3.0
7.35k stars 687 forks source link

Can you help me passing a video for inferencing using cv2 library? and also export the output into the video or get a output frame on every image inference? #32

Closed Abhishek-Quidich closed 1 year ago

an-yongqi commented 1 year ago

It would be a pleasure to help you. May I ask if you want to simply inference and output for each frame, or do you want to track the video? The former is easy to implement and I can help you now, but the visualization color per frame is randomly changing, the latter we will also release soon (FastSAM for Tracking).

an-yongqi commented 1 year ago

@Abhishek-Quidich Here is a reference case, you can make further modifications according to your needs. :blush:

import cv2
from ultralytics import YOLO
import numpy as np
import os
import torch

def show_mask_track(annotation, color_dict):
    num_masks = len(annotation)
    areas = torch.sum(annotation, dim=(1, 2))
    sorted_indices = torch.argsort(areas, descending=False)
    annotation = annotation[sorted_indices]
    colored_masks = annotation[...,None] * color_dict[:num_masks,None,None,:] * 255.0
    result = np.sum(colored_masks.cpu().numpy(), axis=0)
    return result.astype(np.uint8)

max_det = 300
video_path = 'your_video_path'
cap = cv2.VideoCapture(video_path)
model = YOLO("your_model.pt")
save_path = 'your_save_dir' + os.path.split(video_path)[-1][:-4]
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

if not os.path.exists(save_path):
    os.makedirs(save_path)

ret, frame = cap.read()
h, w, _ = frame.shape
video = cv2.VideoWriter(save_path + '/result.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 30, (w, h))
color_dict = torch.rand(max_det, 3, device=device)
while True:
    ret, frame = cap.read()
    try:
        h, w, _ = frame.shape
    except:
        break
    if not ret:
        break

    results = model(frame, device=device, retina_masks=True, iou=0.7, conf=0.25, imgsz=1024, max_det=max_det)

    masks = results[0].masks.data
    mask = show_mask_track(masks,color_dict)
    frame = cv2.addWeighted(frame, 1, mask, 0.7, 0)

    video.write(frame)

video.release()
an-yongqi commented 1 year ago

Hi @Abhishek-Quidich ,

I noticed that the issue you reported seems to be resolved based on my last response.

I would like to close this issue for now to keep the issue tracker organized. However, if the problem persists or if you have any further questions, please feel free to comment here or open a new issue. We value your input and are happy to assist further.

Thank you for your understanding!

Best Regards, Yongqi An

mk622 commented 1 year ago

Is it possible to process RTSP (Real Time Streaming Protocol) video by applying the code just described? If you can figure it out, I would appreciate your help.

Best Regards, mk622

an-yongqi commented 1 year ago

@mk622 is this code work? Replace the cap in the code above. Nothing else needs to be modified.

import cv2
stream_url = "rtsp://your_rtsp_stream_url"
cap = cv2.VideoCapture(stream_url)
mk622 commented 1 year ago

Thanks for your advice, and I could the rtsp to mp4 as below code

import cv2 from ultralytics import YOLO import numpy as np import os import torch

import cv2

user_id = "user_id" user_pw = "user_pw" host = "host"

cap = cv2.VideoCapture(f"rtsp://{user_id}:{user_pw}@{host}/MediaInput/h264")

stream_url = "rtsp://your_rtsp_stream_url"

stream_url = f"rtsp://{user_id}:{user_pw}@{host}/MediaInput/h264"

cap = cv2.VideoCapture(stream_url)

save_path = './output'

def show_mask_track(annotation, color_dict): num_masks = len(annotation) areas = torch.sum(annotation, dim=(1, 2)) sorted_indices = torch.argsort(areas, descending=False) annotation = annotation[sorted_indices] colored_masks = annotation[...,None] color_dict[:num_masks,None,None,:] 255.0 result = np.sum(colored_masks.cpu().numpy(), axis=0) return result.astype(np.uint8)

max_det = 300

video_path = 'your_video_path'

cap = cv2.VideoCapture(video_path)

model = YOLO("./weights/FastSAM-x.pt")

save_path = 'your_save_dir' + os.path.split(video_path)[-1][:-4]

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

if not os.path.exists(save_path): os.makedirs(save_path)

ret, frame = cap.read() h, w, _ = frame.shape video = cv2.VideoWriter(save_path + '/result.mp4', cv2.VideoWriter_fourcc(*'mp4v'), 30, (w, h)) color_dict = torch.rand(max_det, 3, device=device)

while True: ret, frame = cap.read() try: h, w, _ = frame.shape except: break if not ret: break

results = model(frame, device=device, retina_masks=True, iou=0.43, conf=0.25, imgsz=1024, max_det=max_det)

masks = results[0].masks.data
mask = show_mask_track(masks,color_dict)
frame = cv2.addWeighted(frame, 1, mask, 0.3, 0)

video.write(frame)

video.release()

But, What I really want to achieve is to display rtsp video in real time. *Real time is difficult because of the segmentation process, but I would like to have a continuous capture displayed in real time for each process. Sorry for asking in a different part of the question than the essence, but I would be glad if you could help me.

Best Regards, mk622