Mark12Ding / SAM2Long

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree
https://mark12ding.github.io/project/SAM2Long/
Other
279 stars 7 forks source link

Can we handle new object? #1

Closed nguyenquivinhquang closed 2 days ago

nguyenquivinhquang commented 3 weeks ago

Hi there, Thank you for your wonderful work I have a few questions I would like to ask:

  1. Is it possible to handle the appearance of a new object in a video? Specifically, can we detect a new object (one that is not part of the initial list of prompted objects) and continue to segment and track it?
  2. In the memory bank and memory attention, have you experimented with extending the number of frames in memory when performing memory attention?

Thanks a lot.

Mark12Ding commented 2 weeks ago

Hi there, Thank you for your kind words! I'm happy to answer your questions:

  1. Regarding the detection and tracking of new objects in a video: this process still requires prompting. At the frame where a new object appears, you would need to indicate whether that object should be tracked and segmented.

  2. Extending the number of frames in memory during memory attention is indeed worth exploring. However, one concern we have faced when attempting this is related to the current design in SAM2's codebase, where there is a fixed number of learnable temporal embeddings. One possible approach would be to extend these embeddings using linear interpolation, which could be a promising direction.

Thanks again for your great questions!