Closed atagen closed 1 year ago
Hello, atagen :wave:
Can you try to launch another UI from this flake - InvokeAI - and see if it works ? It's more stable and user-friendly one, imo (and the one I'm using myself). It might give us some information about the origin of your issue.
Also could you explain why you used additional env variables before invocation of launch.py? I'm using AMD GPU, so don't have much knowledge about running this stuff with NVidia GPUs. Maybe you have links to the instructions you've used.
hello :)
unfortunately, InvokeAI throws the same:
/nix/store/lvywargqhfhnmwhpk73zl2qy8qrbx0ql-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/cuda/__init__.py:83: UserWarning: HIP initialization: Unexpected error from hipGetDeviceCount(). Did you run some cuda functions before calling NumHipDevices() that might have already set an error? Error 101: hipErrorInvalidDevice (Triggered internally at ../c10/hip/HIPFunctions.cpp:110.)
return torch._C._cuda_getDeviceCount() > 0
re: the additional env variables, they're random tidbits I picked up searching around NixOS, SD, CUDA, etc.. the former is trying to force Torch to see NixOS's CUDA, and I think the latter one is actually a fix for ROCM? they don't seem to have any bearing on the result whether present or not.
just noticed this - seems the nvidia override wasn't getting called after all:
I've corrected this to overlay_nvidia
to match the torch overlay that enables CUDA - this appears to be chugging through a bunch of CUDA related stuff (whereas before it would drop me straight into the shell).
Good eye :eyes: Classic copy-paste error. With lazy nature of Nix and lack of functional tests these errors could slip through. You can submit PR if you want, or I'll fix it myself tomorrow.
I'll be happy to submit one, just waiting for the whole thing to compile so I can confirm it works - none of the caches I use have these versions for some reason, perhaps something to do with CUDA being unfree.
success! for sanity's sake, I might make a second change to switch the nvidia torch and torchvision to their binary counterparts too; it looks like this already happens for AMD anyway.
when attempting to launch automatic1111 UI I'm met with the following:
nvidia-smi correctly shows my card from within the same shell:
launching with the torch CUDA test skipped will launch, but leads to a plethora of errors while loading or attempting to generate anything, probably the only interesting one of which is:
my system is currently using the beta nvidia drivers (525 instead of 520), but I didn't have any better luck from switching back to stable.
please let me know if there's any further information I can provide/tests to run/etc to help.