Closed ShixuanGu closed 1 month ago
the attention map (which is with the shape of HWxHW) should be triangular, while the square version is trasnferred to align the original order (from left-top to right bottom, row by row). You can refer to the code in analyze/utils.py
or analyze/attnmap.py
for details.
Many thanks for the nice work!
Could you kindly explain a little more about the transformed activation map in Fig 4 (c)?
Why for the column-by-column scan direction, the activation map is square, while the row-by-row direction leads to a triangular activation map?