SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.04k stars 85 forks source link

pytorch 1.12.0 CUDA 11.6 Win10 VS2019 build error #43

Closed Ken1256 closed 1 year ago

Ken1256 commented 2 years ago
C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1429): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

C:\Program Files\Python\Python37\lib\site-packages\torch\include\pybind11\cast.h(1503): error: too few arguments for template template parameter "Tuple"
          detected during instantiation of class "pybind11::detail::tuple_caster<Tuple, Ts...> [with Tuple=std::pair, Ts=<T1, T2>]"
(1507): here

2 errors detected in the compilation of "C:/pytorch/NAT/natten/src/nattenav_cuda_kernel.cu".
nattenav_cuda_kernel.cu
ninja: build stopped: subcommand failed.
alihassanijr commented 2 years ago

Hello and thank you for your interest. We recommend using PyTorch 1.11. 1.12 is a very recent release and will likely require us updating the kernel. However, the error you shared does not appear to be from our code. Have you tried 1.11?

Ken1256 commented 2 years ago

Similar problem. https://github.com/facebookresearch/pytorch3d/issues/1127 Maybe need a specific Windows version.

alihassanijr commented 2 years ago

I seriously doubt that, because as I mentioned the error points to pybind, not to our code. Unless that's not the full error. But again, I'd recommend using 1.11, we still haven't even tested our kernel on 1.12.

alihassanijr commented 2 years ago

Edit: It appears to be an incompatibility issue with nvcc. I've seen multiple instances of this in other PyTorch CUDA extensions, maybe they might help?

https://github.com/ashawkey/torch-ngp/issues/51#issuecomment-1111541658

https://github.com/facebookresearch/pytorch3d/issues/1024

https://github.com/bamsumit/slayerPytorch/issues/86

Ken1256 commented 2 years ago

Win10 VS2019 pytorch 1.11.0 CUDA 11.3 pass Win10 VS2019 pytorch 1.12.0 CUDA 11.3 pass Win10 VS2019 pytorch 1.12.0 CUDA 11.6 fail

https://github.com/pytorch/pytorch/issues/69460

alihassanijr commented 2 years ago

Are those your cuda toolkit versions or cuda driver versions? Assuming it's the latter, so just using 1.12 with an earlier toolkit resolved the issue?

Ken1256 commented 2 years ago

Win10 21H2 19044.1706 VS2019 16.11.16(Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30145 for x64) GPU Drive Version 512.59

pytorch 1.11.0 CUDA 11.3, cuda_11.3.1_465.89_win10 pass pip install https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp37-cp37m-win_amd64.whl pip install https://download.pytorch.org/whl/cu113/torchvision-0.12.0%2Bcu113-cp37-cp37m-win_amd64.whl pip install https://download.pytorch.org/whl/cu113/torchaudio-0.11.0%2Bcu113-cp37-cp37m-win_amd64.whl

pytorch 1.12.0 CUDA 11.3, cuda_11.3.1_465.89_win10 pass pip install https://download.pytorch.org/whl/cu113/torch-1.12.0%2Bcu113-cp37-cp37m-win_amd64.whl pip install https://download.pytorch.org/whl/cu113/torchvision-0.13.0%2Bcu113-cp37-cp37m-win_amd64.whl pip install https://download.pytorch.org/whl/cu113/torchaudio-0.12.0%2Bcu113-cp37-cp37m-win_amd64.whl

pytorch 1.12.0 CUDA 11.6, cuda_11.6.0_511.23_windows fail pip install https://download.pytorch.org/whl/cu116/torch-1.12.0%2Bcu116-cp37-cp37m-win_amd64.whl pip install https://download.pytorch.org/whl/cu116/torchvision-0.13.0%2Bcu116-cp37-cp37m-win_amd64.whl pip install https://download.pytorch.org/whl/cu116/torchaudio-0.12.0%2Bcu116-cp37-cp37m-win_amd64.whl

alihassanijr commented 2 years ago

So what is your actual cuda version though? Also, it's unclear, is the kernel it working with the 11.3 toolkit?

Ken1256 commented 2 years ago

Yes. I download from here: https://developer.nvidia.com/cuda-11-3-1-download-archive

alihassanijr commented 2 years ago

So is the issue resolved?

Ken1256 commented 2 years ago

HAT v0.11 issue is resolved. HAT v0.12 There are other build errors.

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(881): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(911): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(911): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1167): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1214): error: expected an expression

Z:\py_test\NAT_v0_12\natten\src\nattenav_cuda_kernel.cu(1214): error: expected an expression

9 errors detected in the compilation of "Z:/py_test/NAT_v0_12/natten/src/nattenav_cuda_kernel.cu".
nattenav_cuda_kernel.cu
ninja: build stopped: subcommand failed.
alihassanijr commented 2 years ago

Are you still on PyTorch v1.12 or 1.11?

Ken1256 commented 2 years ago

On PyTorch v1.11.

alihassanijr commented 2 years ago

Can you clear your compilation cache and try again? I just tried a fresh compile and it works out fine on multiple set ups on my end. I'm not sure where the cache would be on Windows, on linux it's $HOME/.cache/torch_extensions. Could you also confirm you're on the latest commit?

Ken1256 commented 2 years ago

After clearing the cache still build errors. Did you tested on Windows? HAT v0.12 uses much less memory than HAT v0.11?

alihassanijr commented 2 years ago

I'm sorry to hear that. Unfortunately no, we don't have a Windows environment, but the error is really strange. Based on the error you shared it's possible that the issue is: it's not loading a header file, which is new in v0.12. But from what I'm seeing it's probably an incompatibility somewhere in your environment (CUDA vs CUDA toolkit vs PyTorch version), that's resulting in the compilation error -- but again can't really say for certain with the information I have.

And no -- our NA extension just generally uses less memory than SWSA (I can get into details if you want), the memory usage hasn't changed in the new version. But our models will run a lot faster now with the new version basically.

3a1b2c3 commented 2 years ago

PyTorch 1.11 should work?

stevenwalton commented 2 years ago

Yes. 1.11 is the recommended version.

alihassanijr commented 1 year ago

Closing this due to inactivity. If you still have questions feel free to open it back up.