SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.05k stars 86 forks source link

how to build two NAT cuda #15

Closed XiaoyuShi97 closed 2 years ago

XiaoyuShi97 commented 2 years ago

Hi, I hope to build two NAT cuda with different head dimension. But I find that the second one always overwrites the first one. How can I modify setup.py to distinguish them? Are there other codes to be changed? Thx!

alihassanijr commented 2 years ago

Hello and thank you for your interest.

There's actually two options: the better option is to just make dimensions dynamic, which may require you to edit more. You'd just have to get the per head dim from the tensor shapes, like we do for heads, batch size, height, width, and the like. You'd also have to pass those to the kernels and modify the args.

The option you're going for, which is to separate the two different formats, would require you to have two different versions. For that, we'd recommend modifying the kernel names, and all other method names, both in the CPP and CU files. So for instance, you'd keep the original unchanged for DIM=32, and you'd make a copy of the two CPP and two CU files and append 64 to the file names for instance (natten.....64.cpp/.cu). Then you'd have to duplicate everything else for them as well basically, the autograd functions in natten.py, the imports in that same file, and also the tests in gradcheck.py.

If you are using ninja, which is the default in this repository, setup.py is not going to be called in any way, because upon importing in natten.py, ninja is going to compile (if necessary), and not setup.py.

I hope this answers your question. Please let me know if you need more help.

XiaoyuShi97 commented 2 years ago

This perfectly answers my question. Thx a lot!