explanation for figure 4 (c)

MzeroMiko / VMamba

VMamba: Visual State Space Models，code is based on mamba

MIT License

1.82k stars 98 forks source link

explanation for figure 4 (c) #208

Closed ShixuanGu closed 1 month ago

ShixuanGu commented 1 month ago

Many thanks for the nice work!

Could you kindly explain a little more about the transformed activation map in Fig 4 (c)?

Why for the column-by-column scan direction, the activation map is square, while the row-by-row direction leads to a triangular activation map?

MzeroMiko commented 1 month ago

the attention map (which is with the shape of HWxHW) should be triangular, while the square version is trasnferred to align the original order (from left-top to right bottom, row by row). You can refer to the code in analyze/utils.py or analyze/attnmap.py for details.