problem about the dim of Q of transformers QKV

SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

MIT License

1.04k stars 85 forks source link

problem about the dim of Q of transformers QKV #12

Closed xiaochang129 closed 2 years ago

xiaochang129 commented 2 years ago

location: Neighborhood-Attention-Transformer/classification/cuda/nattenav_cuda_kernel.cu line 75 description: The last dim of Q over KERNEL_SIZE^2 is not used in the Q*K.

maybe we can directly use the whole Q * its neighborhood of K; the complexity is only increased by dim of Q.

alihassanijr commented 2 years ago

Thank you for your interest. I'm confused what the issue is. There is no Q in the referenced file, which is the AV kernel and not QK.

alihassanijr commented 2 years ago

Closing due to inactivity.