Open mvoodarla opened 2 months ago
Do you have any details on exactly what was done to make it perform faster?
A lot of it is related to how files are read, stored, and saved by the pipeline around the model though there were also some compute / system-level performance gains we made. Hopefully we will do a separate blog post soon that focuses more on these technical details. Generally though, we've found that most video models tend to be really inefficient in how they're reading frames and performing operations over them in memory.
Do you use the exact model and .yaml provided by meta? Or change some parameters?
no change in parameters! the exact model is whats being used.
Thanks for sharing @mvoodarla! Indeed, there is a lot of room for optimization - looking forward to your blog post!
Hi @mvoodarla , thanks for sharing!
Any updates on the follow-up blog?
I was pretty amazed with SAM 2 when it came out given all the work I do with video. My company works a ton with it and we decided to take a crack at optimizing it, and we made it run 2x faster than the original pipeline!
Unlike LLMs, video models are notorious for incredibly inefficient file reading, storage, and writing which makes them much slower than they need to be.
We wrote a bit about our work here and thought we'd share with the community: https://www.sievedata.com/blog/meta-segment-anything-2-sam2-introduction