Closed nguyenquivinhquang closed 2 days ago
Hi there, Thank you for your kind words! I'm happy to answer your questions:
Regarding the detection and tracking of new objects in a video: this process still requires prompting. At the frame where a new object appears, you would need to indicate whether that object should be tracked and segmented.
Extending the number of frames in memory during memory attention is indeed worth exploring. However, one concern we have faced when attempting this is related to the current design in SAM2's codebase, where there is a fixed number of learnable temporal embeddings. One possible approach would be to extend these embeddings using linear interpolation, which could be a promising direction.
Thanks again for your great questions!
Hi there, Thank you for your wonderful work I have a few questions I would like to ask:
Thanks a lot.