Open CURRY-AND-RICE opened 2 months ago
I opened a PR that can run directly on a video file without extracting and loading all of the frames into memory at once, but it doesn't support a video stream. I would most likely require a large refactor of this repository's codebase to support a video stream, but I know huggingface are working to add the model to transformers, which may be able to support running on a stream.
Thank you for notifying me of such important information! I found an issue on hugginface for adding SAMv2 which is currently in progress. I will continue to explore ways to achieve stream inference and will keep this issue open.
@CURRY-AND-RICE Did you find a good implementation?
@Joao-Pimenta I've been unable to find an implementation that matches my needs. Maybe this will help. https://github.com/facebookresearch/segment-anything-2/issues/90
Are there any existing solutions to facilitate this?
I have a basic example script that runs off videos (should work with webcams even), though it's not finalized and may be missing some features compared to the original video prediction implementation.
Edit: There's also now a UI version, which can also work on webcam:
Currently, I think we can only input video via separated frames stored in a directory. However, for online applications, we should be able to input frames sequentially as they come in. Are there any existing solutions to facilitate this? Additionally, are there plans to add such functionality in the future?
Thank you for amazing work!