Feature Idea: Incorporate "Segment Anything"

M-Colley commented 1 year ago

Hello, it is great that you support out-of-the-box models like YoloV7, do you also plan to include the latest FAI model "Segment-Anything"? I think that could be very helpful!

https://github.com/facebookresearch/segment-anything

Kind regards

timmermansjoy commented 1 year ago

@M-Colley since they support hugging face and roboflow models you could also just make the SAM model available there. And then just import it.

However because this is such a strong model, they should add it to the models imo

nmanovic commented 1 year ago

@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up!

medphisiker commented 1 year ago

@M-Colley , we are discussing how to do that. I agree that the model is very strong. Thanks for the heads up!

Thank you, that would be fantastic !

M-Colley commented 1 year ago

Very cool!

I came across this additional project that combines BLIP, GroundingDINO and stable-diffusion: https://github.com/IDEA-Research/Grounded-Segment-Anything

Might be worth also taking a look at :)

Kind regards

anuragxel commented 1 year ago

I wrote a simple labelling tool on top of SAM, I think CVAT really needs this as a feature, it'll help a lot of people. Feel free to attribute and borrow helpers from my tool if needed:

https://github.com/anuragxel/salt

bsekachev commented 1 year ago

Hi guys, we implemented the first prototype here: #6008

This should work well on GPU for a self-hosted solution. For our platform we are going to find a better solution because it is not gonna work there in current architecture because of a lot of customers.

modyngs commented 1 year ago

This one is also for Video: https://github.com/kadirnar/segment-anything-video

medphisiker commented 1 year ago

Hi guys, we implemented the first prototype here: #6008

This should work well on GPU for a self-hosted solution. For our platform we are going to find a better solution because it is not gonna work there in current architecture because of a lot of customers.

Thank you very much for integrating this neural network! Works like fBRs, but much more accurate. It's great that it has an inference on both CPU and GPU.

modyngs commented 1 year ago

@bsekachev Is there any plan to implement in tracker mode? Thanks

medphisiker commented 1 year ago

@bsekachev Is there any plan to implement in tracker mode? Thanks

Also, there is a very cool XMem model for tracking masks (link). There are very cool video demonstrations that look fantastic. I wrote about it in this issue (link).

descilla commented 1 year ago

First of all, thank you for the quick integration of SAM. SAM really seems to be a huge breakthrough.

Unfortunately, at the moment, only positive and negative points can be used. However,SAM also supports the use of bounding boxes and the combination of bounding boxes and points.

I played around with it a bit (adjusted the serverless function) and was able to use bounding boxes. However, with the following limitations:

At least one additional point must always be set for the function to be "triggered".
The bounding box is only ~~used~~ visible in the first iteration; it disappears when adding more points.

Of course, it could be that I am just misunderstood something, but I assume that these are limitations in the CVAT interface for serverless functions, as I could only find the three parameters min_pos_points, min_neg_points, and startswith_box.

Do you think there is hope that the CVAT interface can be adapted/expanded to make full use of SAM's capabilities? The use of (additional) bounding boxes seems to be able to significantly improve the results in my use case.

shortcipher3 commented 1 year ago

Track Anything would be super cool too: https://github.com/gaomingqi/Track-Anything

bsekachev commented 1 year ago

Hi @descilla

Thank you for reporting. Let's have a dedicated issue about bboxes support and why it is necessary.

bsekachev commented 1 year ago

Hi @shortcipher3

Let's also have another issue about SAM tracker if necessary.

cvat-ai / cvat

Feature Idea: Incorporate "Segment Anything" #5984