fredzzhang / pvic

Official PyTorch implementation for ICCV2023 paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
56 stars 7 forks source link

Could you please provide the code for QPIC's heatmap visualization? #49

Closed thunderbolt-fire closed 1 month ago

thunderbolt-fire commented 2 months ago

image Like this. I try to motify the code in the repository but fail.

fredzzhang commented 2 months ago

Hi @thunderbolt-fire,

The visualisation essentially calls the method pocket.advis.heatmap(image, attn, save_path). You can try to save the attention maps from QPIC. Then it's just a one line of code to do the visualisation.

Fred.

thunderbolt-fire commented 2 months ago

Hi @thunderbolt-fire,

The visualisation essentially calls the method pocket.advis.heatmap(image, attn, save_path). You can try to save the attention maps from QPIC. Then it's just a one line of code to do the visualisation.

Fred.

I try this method.

    #注册一个钩子以获取注意力权重
    attn_weights = []
    #multihead_attn
    hook = model.transformer.decoder.layers[-1].multihead_attn.register_forward_hook(
        lambda self, input,  output: attn_weights.append(output[1])
    )

    ...

    attn_weights = attn_weights[0]
    print(attn_weights.shape)
    #  torch.Size([1, 100, 950])
    pocket.advis.heatmap(img, 
                         attn_weights.cpu(), 
                         save_path=f"pair_avg_attn.png")

but the result img is this

pair_2_avg_attn

fredzzhang commented 2 months ago

Hi @thunderbolt-fire,

The heatmaps has a specific shape,

"""
heatmaps: Tensor
        Heatmap tensors of shape (N, H, W). For N>1, by default, different colour maps will
        be used for each heatmap.
"""

it should be num_query x height x width.

Based on the shape you printed (1x100x950), it contains attention weights for all 100 queries. The last dimension, i.e., 950, corresponds to the collapsed width and height dimensions. You need to recover these dimensions first to get attention maps with shape NxHxW, then you'll be able to visualise it.

Fred.

fredzzhang commented 2 months ago

In addition, it would be clearer to only visualise a small number of queries. Find the ones you want, normally those with high prediction scores, and only visualise those instead of the entire 100 queries stack.

thunderbolt-fire commented 2 months ago

Hi @thunderbolt-fire,

The heatmaps has a specific shape,

"""
heatmaps: Tensor
        Heatmap tensors of shape (N, H, W). For N>1, by default, different colour maps will
        be used for each heatmap.
"""

it should be num_query x height x width.

Based on the shape you printed (1x100x950), it contains attention weights for all 100 queries. The last dimension, i.e., 950, corresponds to the collapsed width and height dimensions. You need to recover these dimensions first to get attention maps with shape NxHxW, then you'll be able to visualise it.

Fred.

thanks,I read your related code in pvic , but i have a question: attn_map = attn[0, :, ho_pair_idx].reshape(8, math.ceil(h / 32), math.ceil(w / 32))

what about math.ceil(h/32) math.ceil(w/32) ? what is h and w ?

fredzzhang commented 1 month ago

It's just the height and width of the image. Because the image features have been downsampled by a factor of 32, they were divided by 32 to get the actual size of the attention map.