The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Hi,
If I understand correctly, you're providing the mask prompt as a dense embedding and then adding it to the image embeddings.
When feeding this combined input into the transformer in the mask decoder, did you perform self-attention on the [image embedding + mask dense embedding]?
Thanks.
Hi, If I understand correctly, you're providing the mask prompt as a dense embedding and then adding it to the image embeddings. When feeding this combined input into the transformer in the mask decoder, did you perform self-attention on the [image embedding + mask dense embedding]? Thanks.