facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
47.87k stars 5.66k forks source link

Multimask output question #679

Open 25benjaminli opened 9 months ago

25benjaminli commented 9 months ago

Why is num_mask_tokens = num_multimask_outputs + 1? And why is it that when you use multimask output, it slices from (1, None)?

MyFirst905 commented 8 months ago

Have you solved it? I also want to know the answer to that question

25benjaminli commented 8 months ago

@MyFirst905 I have not "solved it" but have a rough idea as to why this is the case. According to the paper:

"With one output, the model will average multiple valid masks if given an ambiguous prompt. To address this, we modify the model to predict multiple output masks for a single prompt (see Fig. 3). We found 3 mask outputs is sufficient to address most common cases (nested masks are often at most three deep: whole, part, and subpart). During training, we backprop only the minimum loss over masks. To rank masks, the model predicts a confidence score (i.e., estimated IoU) for each mask"

If I am interpreting this correctly, the extra multimask outputs are supposed to describe different levels of detail.