SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.04k stars 85 forks source link

failure occured in building at pytorch 1.11.0 / CUDA 11.3 / Win10 / VS2019 error #51

Closed helonin closed 2 years ago

helonin commented 2 years ago

Thanks for your great job! But i was so sad since the failure occured in building >_< The Ninja can not generated the file ‘nattenav_cuda.obj’. Please help. It is the error information. 1

2

alihassanijr commented 2 years ago

Thank you for your interest. Could you run these and share their outputs?

python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available()); print(torch._C._cuda_getCompiledVersion(), torch.version.cuda)"
nvcc --version

It's basically failing to even start compiling, so it's likely either a torch or CUDA issue. It's unlikely, but it could be ninja as well. Could you remove ninja and see if it builds?

helonin commented 2 years ago

Thanks for your help! It is the output information. Snipaste_2022-08-09_09-19-11 And how can I remove the Ninja? I am a beginner on Python >_<

stevenwalton commented 2 years ago

Something seems wrong with the paths here. I notice in the first post it says D:\natten\nattencuda.py and it is failing to find files. I suspect that something is going on here but I'm not very familiar with Windows path environments. Is this intended?

I don't think it is ninja. I'm pretty certain that this is either a CUDA issue or environment issue (probably some intersection). I'm certain given the Runtime Error in the first post. The big issue here is I don't know where Windows is caching builds. According to stylegan 3's troubleshooting guide it should be located at :\Users\<username>\AppData\Local\torch_extensions\torch_extensions\Cache, so you should clear any reference to natten there (should be safe to clear everything)

@helonin can you edit gradcheck.py, place the import torch line above the natten import (@alihassanijr we should also change this btw. Our imports should be last) and directly below print your python info? So like this

import torch
print(f"torch {torch.__version__} and cuda {torch.version.cuda}")
from nattencuda import NATTENAVFunction, NATTENQKRPBFunction

This should verify that the file sees the correct torch and cuda versions (I suspect it isn't). Let's see the output of that.

But if you want to uninstall ninja you can just do so through pip.

helonin commented 2 years ago

I did everythin following your guide but anather error occured. Snipaste_2022-08-09_09-19-11

alihassanijr commented 2 years ago

Could you removing the cache directory that @stevenwalton mentioned (you could alternatively set your TORCH_EXTENSIONS_DIR env variable to somewhere else), remove ninja (pip uninstall ninja) and try again?

stevenwalton commented 2 years ago

It is possible that this is a Windows issue? I'm seeing that Python 3.8 only loads DLLs from trusted locations.. @helonin , what version of Python are you using? Does this Overflow link help?

helonin commented 2 years ago

the version of my python is 3.7.10. I have set the TORCH_EXTENSIONSDIR env variable but occured the same problem. ><. I have give up trying in Windows and will try to install the NAT in Ubuntu soon. Thank you all the same!

Snipaste_2022-08-09_09-19-11

stevenwalton commented 2 years ago

Still looks like a environment variable issue. I think you should track down where TORCH_EXTENSIONS_DIR points to as well as where you're allowed to read files from (as per the stack overflow link).

For Ubuntu, note that TORCH_EXTENSIONS_DIR is at ~/.cache/torch_extensions. The path won't exist till you build something.

helonin commented 2 years ago

The work finished successly in Ubuntu! Thank you all the same!

stevenwalton commented 2 years ago

I'll close this issue for now but feel free to open it back up. We do need to test more on Windows.