SysCV / sam-hq

Segment Anything in High Quality [NeurIPS 2023]
https://arxiv.org/abs/2306.01567
Apache License 2.0
3.52k stars 209 forks source link

Question: Is it SAM-HQ model applicable for predicting segmentation mask for the input images without boxes, point or label? #106

Open mzg0108 opened 7 months ago

mzg0108 commented 7 months ago

If I understand it correctly, both SAM and SAM-HQ takes box points, input points and labels (text) as input along with the input image. What about the input images that we don't have these information available for?

If we want to take the human completely out from the scenario and would want the model to take the input and predict the mask, what changes do we need to make to the model?

lkeab commented 6 months ago

we can use the everything mode as demonstrated here, which input uniform sampled points on the images as prompt.

jez-moxmo commented 6 months ago

if I'm not mistaken, you need the prompt encoder to determine embeddings on an image to mask. automask generator actually is a bit misleading as it just generates a point prompt every 20 pixels or so. For each point prompt, embeddings are encoded and these are matched with the model on their IOU and the most probable (or top 3) is determined to be the masks.