SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.05k stars 86 forks source link

Valid Padded Behavior #14

Closed mgriffin1994 closed 2 years ago

mgriffin1994 commented 2 years ago

First off, great paper! I've been looking for transformers with some of the same locality inductive biases as CNNs. I was wondering if you would be able to add support for a valid padding based alternative though. So rather than handling the edges with the altered behavior, just allow reduced output size like in valid padded convolution layers. This is very important in the domain we work in where we need to be equivariant to the specific crop of the image and the dimensions of the input at inference time.

PeiqinZhuang commented 2 years ago

I agree with this point. It is more natural to apply padding operation in order to remain the geometric relationship within that window even if the query pixel is at the edge. It is easy to perform padding and unfold operations in Pytorch. I am also curious about this padding solution compared with the altered solution.

alihassanijr commented 2 years ago

Hello and thank you for your interest, I believe it is possible to be done within the kernel, and it is on our to-do list of creating a general torch extension (with support for different paddings, strides, and dilation). But for now, you can alternatively just mask or crop the outputs to get the valid padding behavior. For that you can just take the output of NeighborhoodAttention, which is B x H x W x C, similar to the input, and crop out the expected region. It would look something like this:

B, H, W, C = x.shape
x = neighborhoodattn(x)
pad = kernel_size // 2
Hn, Wn = H - 2*pad, W - 2*pad
x = x[:, pad:pad+Hn, pad:pad+Wn, :]

B1, H1, W1, C1 = x.shape
assert H1 == Hn and W1 == Wn

I hope this helps. If you have any questions as to why this would be equivalent to a valid pad, please let me know.

alihassanijr commented 2 years ago

I'm closing this issue now because we're moving our extension to its own separate repository, and due to inactivity.

We do not currently have plans for supporting valid padding natively in our kernel.

Please feel free to reopen it if you still have questions, or open an issue in NATTEN if it's related to that.