Can we handle new object?

Mark12Ding / SAM2Long

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree

Other

279 stars 7 forks source link

Hi there, Thank you for your kind words! I'm happy to answer your questions:

Regarding the detection and tracking of new objects in a video: this process still requires prompting. At the frame where a new object appears, you would need to indicate whether that object should be tracked and segmented.
Extending the number of frames in memory during memory attention is indeed worth exploring. However, one concern we have faced when attempting this is related to the current design in SAM2's codebase, where there is a fixed number of learnable temporal embeddings. One possible approach would be to extend these embeddings using linear interpolation, which could be a promising direction.

Thanks again for your great questions!

Mark12Ding / SAM2Long