baudm / parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
https://huggingface.co/spaces/baudm/PARSeq-OCR
Apache License 2.0
565 stars 126 forks source link

Visualizaition of Attention Maps #127

Closed ShashankKrishnaV closed 9 months ago

ShashankKrishnaV commented 9 months ago

Hey @baudm ,

How do you visualize attention maps (especially after the encoder phase of the model)? I was trying with inputs from various articles which did not work for me.

baudm commented 9 months ago

Get the attention mask from the DecoderLayer: https://github.com/baudm/parseq/blob/8734d7323de479050ee2ef6e0268c944cab537bb/strhub/models/parseq/modules.py#L72C28-L72C28

ca_weights contains that information. You can upscale the 16x8 mask to 128x32, then convert it to grayscale for visualization.