facebookresearch / segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Apache License 2.0
47.44k stars 5.61k forks source link

Visualize attention maps #433

Open alexcbb opened 1 year ago

alexcbb commented 1 year ago

Hello,

I'm searching for a way to visualize the attention maps of the pre-trained models but I didn't found any solution yet. Did someone already succesfully did this ?

Thank you !

TemugeB commented 1 year ago

The attention map is calculated right here: https://github.com/facebookresearch/segment-anything/blob/6fdee8f2727f4506cfbbe553e23b895e27956588/segment_anything/modeling/image_encoder.py#L231 If you don't care about model speed, then simply something like this right below attn:

attn_map = attn.detach().cpu().numpy()
np.save_txt('attn_map.dat', attn_map)

However, SAM uses global and local attention. You likely want to look at the global attention maps. In that case, the indices of the global attention are set here:

https://github.com/facebookresearch/segment-anything/blob/6fdee8f2727f4506cfbbe553e23b895e27956588/segment_anything/build_sam.py#L19

The code snippet above will likely not work because I think save_txt expects 2D arrays. So you need to cast to 2D shape.

alexcbb commented 1 year ago

Hi, thank you for your answer. By the way I was looking more in detail in this "global attention" by looking in the referred paper "Exploring Plain Vision Transformer Backbones for Object Detection" but i'm not sure to really understand how this is done here. In the original paper they are talking about using only the final feature maps but in the unofficial Pytorch code (from which SAM get the same implementation : https://github.com/ViTAE-Transformer/ViTDet) they used different layers of window attention. Can you explain me more on this part or provide me the right source to understand it better ? thank you !

ahmadmustafaanis commented 1 year ago

Hi @alexcbb Did you find anyway to do so??