deshwalmahesh / yolov7-deepsort-tracking

Modular and ready to deploy code to detect and track videos using YOLO-v7 and DeepSORT
158 stars 65 forks source link

deepsort tracking #25

Closed garytann closed 1 year ago

garytann commented 1 year ago

Hey there, I am new to deepsort tracking, can I get an understanding of how a simple application works? I am going through the demo notebook and found out there is a detection process and it seems necessarily to do it before the tracking because it is to generate the bbox? even tho the objects that you are tracking is not related to the image?

detector = Detector(classes = [0,17,32]) # it'll detect ONLY [person,horses,sports ball]. class = None means detect all classes. List info at: "data/coco.yaml"
detector.load_model('./yolov7x.pt',) # pass the path to the trained weight file

# Pass in any image path or Numpy Image using 'BGR' format
result = detector.detect('./IO_data/input/images/horses.jpg', plot_bb = True) # plot_bb = False output the predictions as [x,y,w,h, confidence, class]

if len(result.shape) == 3:# If it is image, convert it to proper image. detector will give "BGR" image
    result = Image.fromarray(cv2.cvtColor(result,cv2.COLOR_BGR2RGB)) 

result

I am trying to wrap to wrap this module into an api call that I could use, could you share some insights and resource with me to help me understand the pipeline better?

Thank you!

deshwalmahesh commented 1 year ago

This is a multi phase process. Below is very broad and high level approach. My suggestion, never try to do an MVP without UNDERSTANDING the process and algorithms, let alone building an API and deployment.

  1. Train an "Object Detection" model of our choice. Less the classes, better the result. This network detects a Bounding Box around the objects of your Choice
  2. Crop these Bounding Boxes from image (or frames in videos)
  3. Train a second Siamese Network which differentiate between two objects Person 1 from person 2 or Car 1 from Car 2)
  4. Use the above Siamese Model as ReiD model. Now the cropped boxes from 2 get fed to this model, model gives image embedding for every crop.
  5. Send all the embedding to the Tracking algorithm or model (SORT in out case) which says "okay, this box is supposed to be person 1 from last frame etc etc)

That's how it tracks the boxes starting from Frame 1 to Frame 1. You'll have to read articles and papers to gain in depth knowledge of these.

garytann commented 1 year ago

Hi @deshwalmahesh thank you for the explanation, are there any resources I can look at to gain better knowledge for this? Really keen learner here! Thank you!

deshwalmahesh commented 1 year ago

@garytann Start with reading about "Siamese Network" then go to "Fine tune a Siamese Network for images" and then proceed to "DeepSORT" paper and blogs. Quick Google search would do this.