AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
142.86k stars 26.93k forks source link

[Bug]: Default rocm 5.4.2 is too outdated (Broken on Navi 2x) #15464

Open ckiee opened 7 months ago

ckiee commented 7 months ago

Checklist

What happened?

Hi! On a NixOS system with s-d-nix, and a recent-ish rocm-runtime-5.7.2, the default torch==2.0.1+rocm5.4.2 for AMD Navi 2nd-gens causes a segfault in rocr::AMD::hsa_amd_memory_lock_to_pool

This is fixed when I borrow the command for Navi 3: export TORCH_COMMAND="pip install --pre torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7" and stuff it in my webui-user.sh.

(and also run it wrapped steam-run for NixOS compat, but that's irrelevant for this bug)

Trying to detect via GPU generation is wrong. The script should probe the system ROCm version and pick the appropriate torch package that way.

Console logs

#0  0x00007fff610f67c9 in rocr::AMD::hsa_amd_memory_lock_to_pool(void*, unsigned long, hsa_agent_s*, int, hsa_amd_memory_pool_s, unsigned int, void**) ()
   from /nix/store/pk2mnvmz69c5f3m8615yqbl44p2lksrl-rocm-runtime-5.7.1/lib/libhsa-runtime64.so
Asherathe commented 6 months ago

On OpenSUSE Tumbleweed, the install script gave me torch-2.3.0+rocm5.7 for my Navi21 card. I did have to export HCC_AMDGPU_TARGET=gfx1030. I don't have a system ROCm install, and don't really want one.

OttCS commented 2 weeks ago

New version of ROCm Pytorch is out, and the current install script is broken becuase the download link is gone.

Fix. Open webui.sh and replace all rocm5.7 with rocm6.2 Also, I am finding that HSA_OVERRIDE_GFX_VERSION=11.0.0 is required to launch on a 7800xt