Open PairZhu opened 6 days ago
Hey there! @PairZhu,
Thanks for bringing this up - I completely understand your frustration with the current CameraPredictor implementation.
You're spot on about this being from a third-party source rather than the official SAM2 repo.
Actually, the official SAM2 implementation uses propagate_in_video
to process entire video chunks at once (If my memory serves me right 👀), which is great for batch processing but not so great for what we're trying to do with interactive annotations.
Here's what I'm thinking would work better:
Anyway, instead of using this third-party CameraPredictor (which is really intended for real-time camera applications), it would be better to let users have:
I'm actively working on improving this software when I have time, but honestly, this is something that would really benefit from community input. The current version is just a basic implementation to get things working - kind of like a proof of concept.
Would love to hear your thoughts on this approach! Have you tried any other methods that might work better? I'm definitely open to suggestions and would be happy to discuss different solutions. Let's make this work better for everyone's annotation needs! 😊
Hey there! @PairZhu,
Thanks for bringing this up - I completely understand your frustration with the current CameraPredictor implementation.
You're spot on about this being from a third-party source rather than the official SAM2 repo.
Actually, the official SAM2 implementation uses
propagate_in_video
to process entire video chunks at once (If my memory serves me right 👀), which is great for batch processing but not so great for what we're trying to do with interactive annotations.Here's what I'm thinking would work better:
Anyway, instead of using this third-party CameraPredictor (which is really intended for real-time camera applications), it would be better to let users have:
- Track any objects they're interested in on any frame
- Make adjustments whenever needed
- Have more control over the whole annotation process
I'm actively working on improving this software when I have time, but honestly, this is something that would really benefit from community input. The current version is just a basic implementation to get things working - kind of like a proof of concept.
Would love to hear your thoughts on this approach! Have you tried any other methods that might work better? I'm definitely open to suggestions and would be happy to discuss different solutions. Let's make this work better for everyone's annotation needs! 😊
My idea is to allow users to manually annotate multiple frames and then use the "Run (I)" button to run the model on a specific frame, or use the "Auto run all images at once" option for batch inference. Additionally, new annotations can be added at any time. This approach aligns with the intuitive nature of annotation tasks and is more convenient.
I have already made some adaptations in my local repository to support this, but I haven’t yet switched from CameraPredictor
to VideoPredictor
. My previous attempts to modify CameraPredictor
were unsuccessful, and I don't believe modifying the third-party library to adapt the code is the right approach.
@CVHub520 If possible, I would like to create a new branch to submit part of the code first. I may not have the bandwidth to quickly complete this feature in the near future, and assistance from other developers would be appreciated to finalize this change.
Thanks for the detailed explanation! Your approach makes a lot of sense - separating manual annotation and inference modes would indeed make the workflow more intuitive and flexible.
I really like your suggestion about the dual-mode operation:
This design would give users more control while maintaining efficiency for batch operations. Since you've already started the implementation, creating a new branch would be great! Even partial progress would be valuable for the community to build upon.
Please feel free to submit your current work - incomplete features are welcome, and we can use the PR discussion to plan out the remaining tasks. We would greatly appreciate the support and collaboration of community members to help move this initiative forward.
You can either create a new branch in your fork, or let me know if you'd prefer I set up a feature branch in the main repository. Whatever works best for your workflow!
Thank you for the encouraging feedback! I’m glad to hear that the dual-mode operation idea resonates with you. I agree that separating manual annotation and batch inference will offer more flexibility and control for users.
Regarding the implementation, I plan to make several incremental submissions, and the code may not be fully functional after each commit. To avoid the risk of merging incomplete code into the main branch and to make it easier for other developers to collaborate, I believe it would be better to create a dedicated branch in the main repository.
This way, we can work on the feature collaboratively without affecting the stability of the main branch. Let me know if you’re able to set up the branch, or if you’d prefer me to proceed in another way.
I've created a new branch dev-sam2-video
in the main repository for this feature development. This dedicated branch will indeed be perfect for incremental submissions while keeping the main branch stable.
Feel free to start pushing your changes to this branch whenever you're ready. As you mentioned, we can work on improving the feature collaboratively, and having a dedicated branch will make it easier for other developers to join in and contribute.
Let me know if you need any help getting started with the branch or have any questions about the next steps!
Search before asking
Question
I have been trying to modify the SAM2Video-related code to enable adding annotations on arbitrary frames. However, I was unable to find documentation on the
SAM2CameraPredictor
, leaving me to interpret its functionality based solely on the function names. During this process, I encountered several bugs.After further investigation, I found that this code doesn’t seem to originate from the official SAM2 repository but instead comes from a third-party repository. The added functionality from this source appears to be immature and unstable. Despite multiple modifications to the code, I was still unable to achieve accurate predictions.
One crucial issue is that this code seems to be designed for real-time camera applications. However, for annotation tasks on non-real-time images, there’s no need to use
SAM2CameraPredictor
. While it may have some benefit in reducing startup time for longer sequences, it should at the very least be an optional feature, not a default one.Additional
No response