location: Neighborhood-Attention-Transformer/classification/cuda/nattenav_cuda_kernel.cu line 75
description: The last dim of Q over KERNEL_SIZE^2 is not used in the Q*K.
maybe we can directly use the whole Q * its neighborhood of K; the complexity is only increased by dim of Q.
location: Neighborhood-Attention-Transformer/classification/cuda/nattenav_cuda_kernel.cu line 75 description: The last dim of Q over KERNEL_SIZE^2 is not used in the Q*K.
maybe we can directly use the whole Q * its neighborhood of K; the complexity is only increased by dim of Q.