SforAiDl / Playground

A python library consisting of pipelines for visual analysis of different sports using Computer Vision and Deep Learning.
MIT License
18 stars 17 forks source link

Improving detector performance with tiny yolo by implementing a tracker module #20

Closed ashwinvaswani closed 4 years ago

ashwinvaswani commented 4 years ago

FPS issues were partially solved with the usage of tiny yolo but it comes at the cost of Detector performance / maP. Check/implement on tracking strategies to improve the detector for tiny yolo.

jaygala commented 4 years ago

Aren't we using YOLO as a pre-trained model for object detection? So, to improve the detection, can't we try improving the model itself? I didn't quite understand what we have to do here.

ashwinvaswani commented 4 years ago

Basically, tracking with detections across all frames is not the best approach because then tracking depends on detection accuracy. Since we are using tiny yolo to improve speed at the cost of detector performance, we can use some tracking technique to make it more robust in the sense that we can track the player even if the detector fails to detect a player that was previously detected

ashwinvaswani commented 4 years ago

However, we'll have to check and explore how to go about it

ashwinvaswani commented 4 years ago

Do you want me to assign it to you?

jaygala commented 4 years ago

I'll definitely try to work on it and solve it, but I think if someone else also has something to contribute to this issue then its better for the project.

samruddhibothara commented 4 years ago

I could try working on it too. Have something in mind but not sure if it'll improve the performance much as expected.

ashwinvaswani commented 4 years ago

@samruddhibothara Can you share what you have in mind? If feasible, We can surely give it a try

samruddhibothara commented 4 years ago

Since the player is likely to be in frame for most of the time, we could run the detect_player_image every few frames (3 or something depending on the video fps and required accuracy), and in between those frames the location of the player would be the weighted mean (weight depending on how many frames we're skipping) of the coordinates of previous detection and Current detection. Boxes from those coordinates and the corresponding frame from the original video can be used to create the output video frames (for frames where we're not detecting)

ashwinvaswani commented 4 years ago

@samruddhibothara That sounds great. I'm assigning it to you. Kindly try this out and check on skip sizes (how many frames you skip) vs inference time for both yolov3 and tiny yolo for each video(video.mp4 and video2.mp4). We can also use the weighted means idea for solving the low performance issue(only one player being detected for some frames) in tiny-yolo which would make it more robust. Great suggestion!

jaygala commented 4 years ago

@samruddhibothara This is a very good idea. It'll surely help with the total time taken for detection. And as @ashwinvaswani said, it can solve the problem we face in tiny yolo. I'd be happy to help you with this.