Closed thunderbolt-fire closed 1 month ago
Like this.
I try to motify the code in the repository but fail.
Hi @thunderbolt-fire,
The visualisation essentially calls the method pocket.advis.heatmap(image, attn, save_path)
. You can try to save the attention maps from QPIC. Then it's just a one line of code to do the visualisation.
Fred.
Hi @thunderbolt-fire,
The visualisation essentially calls the method
pocket.advis.heatmap(image, attn, save_path)
. You can try to save the attention maps from QPIC. Then it's just a one line of code to do the visualisation.Fred.
I try this method.
#注册一个钩子以获取注意力权重
attn_weights = []
#multihead_attn
hook = model.transformer.decoder.layers[-1].multihead_attn.register_forward_hook(
lambda self, input, output: attn_weights.append(output[1])
)
...
attn_weights = attn_weights[0]
print(attn_weights.shape)
# torch.Size([1, 100, 950])
pocket.advis.heatmap(img,
attn_weights.cpu(),
save_path=f"pair_avg_attn.png")
but the result img is this
Hi @thunderbolt-fire,
The heatmaps
has a specific shape,
"""
heatmaps: Tensor
Heatmap tensors of shape (N, H, W). For N>1, by default, different colour maps will
be used for each heatmap.
"""
it should be num_query x height x width
.
Based on the shape you printed (1x100x950
), it contains attention weights for all 100 queries. The last dimension, i.e., 950, corresponds to the collapsed width and height dimensions. You need to recover these dimensions first to get attention maps with shape NxHxW
, then you'll be able to visualise it.
Fred.
In addition, it would be clearer to only visualise a small number of queries. Find the ones you want, normally those with high prediction scores, and only visualise those instead of the entire 100 queries stack.
Hi @thunderbolt-fire,
The
heatmaps
has a specific shape,""" heatmaps: Tensor Heatmap tensors of shape (N, H, W). For N>1, by default, different colour maps will be used for each heatmap. """
it should be
num_query x height x width
.Based on the shape you printed (
1x100x950
), it contains attention weights for all 100 queries. The last dimension, i.e., 950, corresponds to the collapsed width and height dimensions. You need to recover these dimensions first to get attention maps with shapeNxHxW
, then you'll be able to visualise it.Fred.
thanks,I read your related code in pvic , but i have a question:
attn_map = attn[0, :, ho_pair_idx].reshape(8, math.ceil(h / 32), math.ceil(w / 32))
what about math.ceil(h/32) math.ceil(w/32)
?
what is h and w ?
It's just the height and width of the image. Because the image features have been downsampled by a factor of 32, they were divided by 32 to get the actual size of the attention map.