Channel_Attention: rearrange(t, 'b (head d) (h ph) (w pw) -> b (h w) head d (ph pw)', ph=self.ps, pw=self.ps, head=self.heads)
Channel_Attention_grid:rearrange(t, 'b (head d) (h ph) (w pw) -> b (ph pw) head d (h w)', ph=self.ps, pw=self.ps, head=self.heads)
Why is the line 2 split feature as uniform grid?
Assuming the feature shape is [B,3,6,6], ph=pw=2, therefore, line 1 shape: [Bx3x3,C,2,2], line 2 shape: [Bx2x2,C,3,3],
Is it possible to consider the CA split feature as the 2x2 non-overlapping patch,and the CA_grid split feature as the 3x3 non-overlapping patch?
I have the same problem and it doesn't feel consistent with the diagram drawn in the author's paper. And it also feels non-local in how it represents the global picture
Channel_Attention: rearrange(t, 'b (head d) (h ph) (w pw) -> b (h w) head d (ph pw)', ph=self.ps, pw=self.ps, head=self.heads) Channel_Attention_grid:rearrange(t, 'b (head d) (h ph) (w pw) -> b (ph pw) head d (h w)', ph=self.ps, pw=self.ps, head=self.heads) Why is the line 2 split feature as uniform grid? Assuming the feature shape is [B,3,6,6], ph=pw=2, therefore, line 1 shape: [Bx3x3,C,2,2], line 2 shape: [Bx2x2,C,3,3], Is it possible to consider the CA split feature as the 2x2 non-overlapping patch,and the CA_grid split feature as the 3x3 non-overlapping patch?