gaomingqi / Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
MIT License
6.37k stars 468 forks source link

Can tracking anything track all objects in a class in a long video? #98

Open tanphan07 opened 1 year ago

tanphan07 commented 1 year ago

Thank you for your excellent project about tracking. I have a question about tracking, can you explain it? In this project, A object is tracked if it appears in the first frame, but sometimes, I want to track a new object that appears in another timeline (For example, i have a 1:30m video, and the first object appears in 5th second and the second object appears in 10th sec. I want to track both objects). Can this project handle this problem?

JaneYang07 commented 1 year ago

I'm also interested in knowing more about this^ I have been testing TAM and it seems like for tracking new objects that appear later than the "reference first frame", a new round of tracking is needed. Please correct me if I'm wrong!

If that's the case, perhaps a solution could be to perform a multi-round tracking? For example, (1) track the first 9 seconds of the video in the first round of tracking using the true first frame; (2) go to the 10th second of the video where the second object appears, using the 10th second frame as the "new first frame" to perform a second round of tracking; (3) use ffmpeg or other AV tools to concat the first 10 seconds of the first tracking result with the second tracking result.

tanphan07 commented 1 year ago

I'm also interested in knowing more about this^ I have been testing TAM and it seems like for tracking new objects that appear later than the "reference first frame", a new round of tracking is needed. Please correct me if I'm wrong!

If that's the case, perhaps a solution could be to perform a multi-round tracking? For example, (1) track the first 9 seconds of the video in the first round of tracking using the true first frame; (2) go to the 10th second of the video where the second object appears, using the 10th second frame as the "new first frame" to perform a second round of tracking; (3) use ffmpeg or other AV tools to concat the first 10 seconds of the first tracking result with the second tracking result.

But can we know the objects that still tracking from first round of tracking to second round of tracking

JaneYang07 commented 1 year ago

I'm also interested in knowing more about this^ I have been testing TAM and it seems like for tracking new objects that appear later than the "reference first frame", a new round of tracking is needed. Please correct me if I'm wrong! If that's the case, perhaps a solution could be to perform a multi-round tracking? For example, (1) track the first 9 seconds of the video in the first round of tracking using the true first frame; (2) go to the 10th second of the video where the second object appears, using the 10th second frame as the "new first frame" to perform a second round of tracking; (3) use ffmpeg or other AV tools to concat the first 10 seconds of the first tracking result with the second tracking result.

But can we know the objects that still tracking from first round of tracking to second round of tracking

From what I tested, it seems like if a mask is not drawn on the "second first frame", then the object won't be tracked. Ideally, the "second first frame" should be a frame that captures both objects of interests so that both objects can be tracked in the second round.

Unfortunately, I'm still exploring and don't have a good solution for case like this yet. I have also been testing whether concatenating an artificially created "first frame" (capturing all objects of interests) to the video would help with tracking, but I haven't gotten any good results.

CuriousTank commented 8 months ago

Demo Example - Multiple Object Tracking and Segmentation (with [XMem](https://github.com/hkchengrex/XMem)), video_name:qingming.mp4 looks exactly what you said.

JaneYang07 commented 7 months ago

Demo Example - Multiple Object Tracking and Segmentation (with [XMem](https://github.com/hkchengrex/XMem)), video_name:qingming.mp4 looks exactly what you said.

Thanks for the recommendation! Will check this out