Multimask output question

25benjaminli commented 9 months ago

Why is num_mask_tokens = num_multimask_outputs + 1? And why is it that when you use multimask output, it slices from (1, None)?

MyFirst905 commented 8 months ago

Have you solved it? I also want to know the answer to that question

25benjaminli commented 8 months ago

@MyFirst905 I have not "solved it" but have a rough idea as to why this is the case. According to the paper:

"With one output, the model will average multiple valid masks if given an ambiguous prompt. To address this, we modify the model to predict multiple output masks for a single prompt (see Fig. 3). We found 3 mask outputs is sufficient to address most common cases (nested masks are often at most three deep: whole, part, and subpart). During training, we backprop only the minimum loss over masks. To rank masks, the model predicts a confidence score (i.e., estimated IoU) for each mask"

If I am interpreting this correctly, the extra multimask outputs are supposed to describe different levels of detail.

facebookresearch / segment-anything

Multimask output question #679