facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
47.87k stars 5.66k forks source link

Question of the final masks #765

Closed zkjisj closed 5 months ago

zkjisj commented 5 months ago

In my understanding num_multimask_outputs mean the number of final masks, as default is 3. I'm confused about the meaning of self.num_mask_tokens, as it is the add of num_multimask_outputs and 1. image In the final produce of masks, the shape of output masks seems to be b*self.num_mask_tokens. After that, the postprocess_masks don't change the shape. However, I have seen some implementations finally output 3 masks, as it is the default number of num_multimask_outputs. They take use of the predictor, and it follows the same process as sam. image

heyoeyo commented 5 months ago

The extra 4th mask (the 'zeroth' mask in the output) is used when multiple prompts are provided. You can see how it's used in the forward function of the mask decoder.

The paper explains this in a bit more detail in the appendix under the second paragraph of the section: Making the model ambiguity-aware (page 17).

zkjisj commented 5 months ago

@heyoeyo Thanks for your reply, It helps a lot!