I used to think that the soft object region is likely attention ,the soft object region is attention at different class at a picture ,but affter I visual the it ”aux_out“ (bs,19,h,w) , I was confused . Can you explain this? ps: I just choose the 2 class to show
I'm surprised that these maps look like this, i would have expected them to be attending to objects, but they look very noisy. I'm not sure i can explain this.
I used to think that the soft object region is likely attention ,the soft object region is attention at different class at a picture ,but affter I visual the it ”aux_out“ (bs,19,h,w) , I was confused . Can you explain this? ps: I just choose the 2 class to show