Self attention in mask decoder

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Apache License 2.0

12.41k stars 1.14k forks source link

Self attention in mask decoder #449

Open Tarna-deep opened 1 day ago

Tarna-deep commented 1 day ago

Hi, If I understand correctly, you're providing the mask prompt as a dense embedding and then adding it to the image embeddings. When feeding this combined input into the transformer in the mask decoder, did you perform self-attention on the [image embedding + mask dense embedding]? Thanks.