Closed minuenergy closed 1 year ago
These issues can be a bit tricky to solve sometimes (you'll see plenty of similar questions on the stylegan repo). In my experience there are three things to check first. I'll try to be as complete as possible.
This is under ~/.cache/torch_extensions/
for *nix machines. Here's what mine looks like
(py310) λ ~/ ls .cache/torch_extensions
bias_act_plugin fused py37_cu113 upfirdn2d_plugin
(py310) λ ~/ ls .cache/torch_extensions/py37_cu113
fused nattenav_cuda nattenqkrpb_cuda upfirdn2d
You can safely delete things in that folder. What's shown is specifically build from StylenNAT (upnfirdn2d
, bias_act_plugin
and fused
are from StyleGAN and anything with natten
is from neighborhood attention).
While we have an option for natten to be built before runtime I don't think Karras provided a clean way to do this. You can manually do it by using the module's init function. Here's the section for bias_act
(which appears to be the function failing to build). Just in case, here's the original version which has a fallback (you'd need to do the same for other custom ops like fma
and upfirdn2d
but this is not preferable). I believe this was taken out in StyleGAN3.
Do you know how you installed everything? The currently listed command is conda install pytorch torchvision pytorch-cuda=11.7 -c pytorch -c nvidia
. This is the same command I used but their version changes (in fact, it has since this code was released). I always rely on torch's official website rather than any other source. Sometimes doing a fresh install can help (rebuild without cache. Sometimes try rebuilding conda). Can you also try using cuda 11.7?
nvcc
is frequently an issue.Do you have multiple versions on your machine? When building the cudatoolkit you get an nvcc
version as well. This should probably be prioritized. I suggest checking without loading your conda environment first and then with it. See my example here.
(base) λ ~/ which nvcc
/usr/bin/nvcc
(base) λ ~/ conda activate py310
(py310) λ ~/ which nvcc
/home/users/swalton2/.anaconda3/envs/py310/bin/nvcc
You can also check what your system has. You can do this all at once by searching /
but I suggest breaking it into multiple commands because that might be slow (just searching these two locations should be sufficient but might not be. 2> /dev/null
just throws away errors like not being able to access a location since you're not sudo)
$ find /usr -name nvcc 2> /dev/null
$ find /opt -name nvcc 2> /dev/null
To prioritize a nvcc
version you can prioritize your PATH
variable. For example I use export PATH="${USER}/export /.anaconda3/bin:$PATH"
(note that my conda location is different than yours, which is in /opt
). This makes your system check anaconda for programs before it checks elsewhere (such as /usr/bin
!). Verify with echo $PATH
(or by looking at all your environment variables env
). This is a per terminal session thing, so best to place this in your shell's rc file (like ~/.bashrc
or ~/.zshrc
). The reason my nvcc
location changes in the which
command, from above, is because of this export
command.
Hopefully this fixes it! If not there's still some things to check. You can import os
into some of the torch_utils
files and check that the proper version of nvcc
is being loaded. Or it could be another environment variable.
If these things don't work let me know and we'll dig in a bit further. It may also be useful to look up the issue pages on StyleGAN2 and StyleGAN3 as there will be users likely hitting similar issues.
This is being closed for now due to lack of activity.
Note that I just pushed some changes that may make the inference a bit easier.
My server is Linux 18.04 , cuda toolkit 11.7, 2080ti
I use docker which is Linux 18.04, cuda toolkit=11.6, python3.10 when I command python main.py type=inference
include
issues came out how can i fix this??