gaomingqi / Track-Anything

Track-Anything is a flexible and interactive tool for video object tracking and segmentation, based on Segment Anything, XMem, and E2FGVI.
MIT License
6.39k stars 470 forks source link

Testing on video segmentation datasets #17

Open fanghaook opened 1 year ago

fanghaook commented 1 year ago

Can this project handle videos composed of image sequences? Does the segmentation map output by SAM contain category information? However, the paper presents the quantitative results of the model on DAVIS. What can I do to test other video datasets, such as YouTube-VOS or YouTube-VIS, to obtain performance metrics on these datasets?

memoryunreal commented 1 year ago

Yes, this project can handle videos composed of image sequences. You can edit some codes to replace the video input with Image sequences input here. (For now, there is no gradio component for input a list of images). https://github.com/gaomingqi/Track-Anything/blob/6d3925046a26cbb2219645ed6ecc1bae487da5f7/app.py#L75-L101 SAM does not contain category information. You can try CLIP to output the category. Support image sequences input is on our to-do list. We will appreciate your pull requests to realize it if you are interested.