SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.05k stars 86 forks source link

Question about the dimension of head #11

Closed PeiqinZhuang closed 1 year ago

PeiqinZhuang commented 2 years ago

Hi, I notice that right now the dimension of the head is fixed as 32 because of the constraint of the Cuda kernel. I wonder what if I change the dimension of the head to 64 since that figure in some codebases is set as 64.

alihassanijr commented 2 years ago

Hello, Thank tou for your interest.

We fixed the dimension for our purposes to get a minor speed improvement. You can modify the line #define DIM 32 and change it to 64 and recompile if you're interested in doing that. We plan to release either a separate version of the kernel with dynamic dims, or merge that into the kernel directly.

XiaoyuShi97 commented 2 years ago

Hi, I want to double check that no matter the value of dim and num_heads, the dim of head is always 32?

alihassanijr commented 2 years ago

Hi, No, we specifically kept the per-head dim at 32 for our 4 variants, and extended heads for larger variants. That's why we kept it fixed in the kernel.

alihassanijr commented 2 years ago

Just an update, You can now use arbitrary dims per head with v0.11 (PR #23 )

@PeiqinZhuang If that resolves your question, feel free to close the issue.

PeiqinZhuang commented 1 year ago

Just an update, You can now use arbitrary dims per head with v0.11 (PR #23 )

@PeiqinZhuang If that resolves your question, feel free to close the issue.

Hi, I have one question. Should I change the block size from 32 to 64, if I change the default dimension from 32 to 64.

alihassanijr commented 1 year ago

Sorry, to what exactly are you referring by block size?

alihassanijr commented 1 year ago

Closing this due to inactivity. If you still have questions feel free to open it back up.