[Bug]: AMD RX 6800 XT - All NaNs or Black Square on Fresh Install

hallucination-gallery commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

Followed each of the potential installation instructions to install on AMD Automatic on Ubuntu 22.04, Arch with AMD, and Docker on Arch No images able to be generated. Only NaN or black squares Same issue on all fronts, so using the Docker installation process as an example I am installing the program remotely via SSH on a gaming PC / Server, and accessing the UI via local network

Steps to reproduce the problem

Fresh installation of Arch Linux using archinstall
Install AMD ROCm packages from AUR
export HSA_OVERRIDE_GFX_VERSION='10.3.0'
install docker and start it as a service
Follow instructions for Running Inside Docker (Image installed is rocm5.5_ubuntu20.04_py3.8_pytorch_staging)
On first run, using REQS_FILE='requirements.txt' python launch.py --listen --precision full --no-half -A tensor with all NaNs was produced in Unet.
Second run with REQS_FILE='requirements.txt' python launch.py --listen --precision full --no-half --disable-nan-check -Processing appears to complete successfully, but only produces a black square

What should have happened?

An image of any sort is generated from any prompt

Commit where the problem happens

https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/5ab7f213bec2f816f9c5644becb32eb72c8ffb89

What platforms do you use to access the UI ?

Windows

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

--listen --precision full --no-half --disable-nan-check

List of extensions

No

Console logs

root@arch-ai:/dockerx/stable-diffusion-webui# REQS_FILE='requirements.txt' python launch.py --listen --precision full --
no-half
Python 3.8.16 (default, Mar  2 2023, 03:21:46)
[GCC 11.2.0]
Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89
Installing requirements
Launching Web UI with arguments: --listen --precision full --no-half
No module 'xformers'. Proceeding without it.
Loading weights [6ce0161689] from /dockerx/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /dockerx/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 3.2s (load weights from disk: 0.2s, create model: 0.7s, apply weights to model: 1.1s, load VAE: 0.2s, move model to device: 0.7s, load textual inversion embeddings: 0.2s).
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 10.2s (import torch: 1.1s, import gradio: 1.2s, import ldm: 0.6s, other imports: 0.6s, load scripts: 0.5s, load SD checkpoint: 3.3s, create ui: 0.6s, gradio launch: 2.1s).
  0%|                                                                                            | 0/20 [00:04<?, ?it/s]
Error completing request
Arguments: ('task(pwx4v24cl7fynog)', 'a spaceship shaped like a dragonfly shooting a laser', '', [], 20, 0, False, False, 1, 1, 7, -1.0, -1.0, 0, 0, 0, False, 512, 512, False, 0.7, 2, 'Latent', 0, 0, 0, [], 0, False, False, 'positive', 'comma', 0, False, False, '', 1, '', [], 0, '', [], 0, '', [], True, False, False, False, 0) {}
Traceback (most recent call last):
  File "/dockerx/stable-diffusion-webui/modules/call_queue.py", line 57, in f
    res = list(func(*args, **kwargs))
  File "/dockerx/stable-diffusion-webui/modules/call_queue.py", line 37, in f
    res = func(*args, **kwargs)
  File "/dockerx/stable-diffusion-webui/modules/txt2img.py", line 56, in txt2img
    processed = process_images(p)
  File "/dockerx/stable-diffusion-webui/modules/processing.py", line 515, in process_images
    res = process_images_inner(p)
  File "/dockerx/stable-diffusion-webui/modules/processing.py", line 669, in process_images_inner
    samples_ddim = p.sample(conditioning=c, unconditional_conditioning=uc, seeds=seeds, subseeds=subseeds, subseed_strength=p.subseed_strength, prompts=prompts)
  File "/dockerx/stable-diffusion-webui/modules/processing.py", line 887, in sample
    samples = self.sampler.sample(self, x, conditioning, unconditional_conditioning, image_conditioning=self.txt2img_image_conditioning(x))
  File "/dockerx/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 377, in sample
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/dockerx/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 251, in launch_sampling
    return func()
  File "/dockerx/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 377, in <lambda>
    samples = self.launch_sampling(steps, lambda: self.func(self.model_wrap_cfg, x, extra_args={
  File "/opt/conda/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/dockerx/stable-diffusion-webui/repositories/k-diffusion/k_diffusion/sampling.py", line 145, in sample_euler_ancestral
    denoised = model(x, sigmas[i] * s_in, **extra_args)
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1488, in _call_impl
    return forward_call(*args, **kwargs)
  File "/dockerx/stable-diffusion-webui/modules/sd_samplers_kdiffusion.py", line 167, in forward
    devices.test_for_nans(x_out, "unet")
  File "/dockerx/stable-diffusion-webui/modules/devices.py", line 156, in test_for_nans
    raise NansException(message)
modules.devices.NansException: A tensor with all NaNs was produced in Unet. Use --disable-nan-check commandline argument to disable this check.

^CInterrupted with signal 2 in <frame at 0x67c9a380, file '/dockerx/stable-diffusion-webui/webui.py', line 266, code wait_on_server>
root@arch-ai:/dockerx/stable-diffusion-webui# REQS_FILE='requirements.txt' python launch.py --listen --precision full --no-half --disable-nan-check
Python 3.8.16 (default, Mar  2 2023, 03:21:46)
[GCC 11.2.0]
Commit hash: 5ab7f213bec2f816f9c5644becb32eb72c8ffb89
Installing requirements
Launching Web UI with arguments: --listen --precision full --no-half --disable-nan-check
No module 'xformers'. Proceeding without it.
Loading weights [6ce0161689] from /dockerx/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /dockerx/stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 3.4s (load weights from disk: 0.3s, create model: 0.8s, apply weights to model: 1.1s, load VAE: 0.2s, move model to device: 0.7s, load textual inversion embeddings: 0.2s).
Running on local URL:  http://0.0.0.0:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 10.4s (import torch: 1.3s, import gradio: 1.4s, import ldm: 0.4s, other imports: 0.7s, load scripts: 0.5s, load SD checkpoint: 3.4s, create ui: 0.5s, gradio launch: 2.1s).
100%|███████████████████████████████████████████████████████████████████████████████████| 20/20 [00:04<00:00,  4.19it/s]
Total progress: 100%|███████████████████████████████████████████████████████████████████| 20/20 [00:09<00:00,  2.05it/s]
Total progress: 100%|███████████████████████████████████████████████████████████████████| 20/20 [00:09<00:00,  5.44it/s]

Additional information

I am able to use the unofficial windows AMD installation process on the same machine without issue (aside from it being slow)

checksumfail commented 1 year ago

Same issue here, also AMD RX 6800 XT on Arch Linux, I'm able to use Easy Diffusion out of the box just fine, but I've slammed my head against a wall for days trying to get a1111 to work to no avail.

hallucination-gallery commented 1 year ago

I was able to get A1111 (and several other UI's) working by downgrading my torch packages. pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --extra-index-url https://download.pytorch.org/whl/rocm5.2

I receive an error on startup saying that my torch version is unsupported, but the interface works and I'm able to generate images.

robonxt commented 1 year ago

For anything AMD, try running the directml fork (go to forks, and click on the first one).

hallucination-gallery commented 1 year ago

Tested that out today. Generated the same NaNs exceptions while running the directml fork as I did on main while running torch 2.0.0. Exceptions cleared up when I downgraded my torch packages.

robonxt commented 1 year ago

Oh, have you tried the --disable-nan-check option?

checksumfail commented 1 year ago

Oh, have you tried the --disable-nan-check option?

For me, this doesn't stop the problem, just suppresses the error, I just get black images or random noise.

robonxt commented 1 year ago

For me, this doesn't stop the problem, just suppresses the error, I just get black images or random noise.

You'll need --precision full --no-half as well, at least what I remember off the top of my head.

checksumfail commented 1 year ago

Yes, I tried those flags too, made no difference.

I was able to get A1111 (and several other UI's) working by downgrading my torch packages. pip install torch==1.13.1+rocm5.2 torchvision==0.14.1+rocm5.2 --extra-index-url https://download.pytorch.org/whl/rocm5.2

Strange, I get this error, not sure why

ERROR: Could not find a version that satisfies the requirement torch==1.13.1+rocm5.2 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1) ERROR: No matching distribution found for torch==1.13.1+rocm5.2

robonxt commented 1 year ago

Here is my setup:

--backend directml --disable-nan-check --no-download-sd-model --enable-insecure-extension-access --no-gradio-queue --medvram --always-batch-cond-uncond --no-half --precision full --upcast-sampling --use-cpu CLIP BLIP interrogate gfpgan bsrgan esrgan scunet codeformer --opt-split-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --update-all-extensions --update-check --listen

I also fresh installed the directml fork (lshqqytiger's stable-diffusion-webui-directml) as I broke stuff earlier this week.

hallucination-gallery commented 1 year ago

ERROR: Could not find a version that satisfies the requirement torch==1.13.1+rocm5.2 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1) ERROR: No matching distribution found for torch==1.13.1+rocm5.2

That is really strange - the command is taken straight from the PyTorch website guide for installing previous versions - I only excluded the torchaudio package.

--backend directml --disable-nan-check --no-download-sd-model --enable-insecure-extension-access --no-gradio-queue --medvram --always-batch-cond-uncond --no-half --precision full --upcast-sampling --use-cpu CLIP BLIP interrogate gfpgan bsrgan esrgan scunet codeformer --opt-split-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --update-all-extensions --update-check --listen

I'll try adding some of these in one by one later. I bet it's one of the "--use-cpu" flags that's letting your setup work, my guess is CLIP, because I'm able to generate noise in other UI's on 2.0.0.

checksumfail commented 1 year ago

Here is my setup:

--backend directml --disable-nan-check --no-download-sd-model --enable-insecure-extension-access --no-gradio-queue --medvram --always-batch-cond-uncond --no-half --precision full --upcast-sampling --use-cpu CLIP BLIP interrogate gfpgan bsrgan esrgan scunet codeformer --opt-split-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --update-all-extensions --update-check --listen

I also fresh installed the directml fork (lshqqytiger's stable-diffusion-webui-directml) as I broke stuff earlier this week.

Doesn't use cpu flag just bypass using the gpu entirely? Surely that's super slow?

robonxt commented 1 year ago

...I bet it's one of the "--use-cpu" flags that's letting your setup work, my guess is CLIP, because I'm able to generate noise in other UI's on 2.0.0.

Quite possible. I don't wanna touch my setup anymore since it's working, but I'll probably test more again when there's breaking changes

Doesn't use cpu flag just bypass using the gpu entirely? Surely that's super slow?

Kinda, but it's like the only way for certain parts of stable diffusion to work properly. It's more of a failsafe, I think. Also it's only CLIP BLIP interrogate gfpgan bsrgan esrgan scunet codeformer that are running on CPU.

I highly recommend reading this discussion for more details on setting up on AMD GPUs.

hallucination-gallery commented 1 year ago

Your setup didn't work for me, in full or in part. I'm not sure what the difference in our configs is but I'm going to keep with the main fork and drop down to Torch 1.13.1

I don't need to use any args running on Torch 1.13.1. None of the must-haves in the discussion linked, it just works, No black squares and 512x512 images generate at 7-9 it/s.

yangtou2000 commented 1 year ago

same issue on ubuntu 22.04+kernel 5.19 with pytorch2.01+ rocm5.4.2.

QuantumRange commented 1 year ago

same issue on 6.3.8-arch1-1 with torch2.0.1+rocm5.4.2 (stable-diffusion-ui worked with HSA_OVERRIDE_GFX_VERSION='10.3.0')

dbarbeau commented 1 year ago

ERROR: Could not find a version that satisfies the requirement torch==1.13.1+rocm5.2 (from versions: 1.13.0, 1.13.1, 2.0.0, 2.0.1) ERROR: No matching distribution found for torch==1.13.1+rocm5.2

That is really strange - the command is taken straight from the PyTorch website guide for installing previous versions - I only excluded the torchaudio package.

--backend directml --disable-nan-check --no-download-sd-model --enable-insecure-extension-access --no-gradio-queue --medvram --always-batch-cond-uncond --no-half --precision full --upcast-sampling --use-cpu CLIP BLIP interrogate gfpgan bsrgan esrgan scunet codeformer --opt-split-attention --sub-quad-q-chunk-size 512 --sub-quad-kv-chunk-size 512 --sub-quad-chunk-threshold 80 --update-all-extensions --update-check --listen

I'll try adding some of these in one by one later. I bet it's one of the "--use-cpu" flags that's letting your setup work, my guess is CLIP, because I'm able to generate noise in other UI's on 2.0.0.

The builds on PyTorch for 1.x don't include Python 3.11. You'll need to make sure you are running a Python 3.10 at most. Files are here https://download.pytorch.org/whl/torch/. But you'll also need to downgrade ROCm etc... Or instead build from source for Python 3.11...

anyputer commented 8 months ago

same black images or NaN error encountered here with my RX 680M iGPU (4 GB vram) on Arch Linux, kernel 6.6.7-zen, python-pytorch-opt-rocm-2.1.2-1, rocm-hip-sdk-5.7.1-2. this is the command i'm currently using to reproduce this:

PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.6,max_split_size_mb:128 HSA_OVERRIDE_GFX_VERSION=10.3.0 ./webui.sh --always-batch-cond-uncond --opt-sub-quad-attention --lowvram --disable-nan-check

notice how i'm not disabling half precision which has visible progress before it anticlimactically runs out of memory, or forcing CPU which works but is 3 times slower than that. this is beyond frustrating.

P.S. taking my user out of the video group reduced equally strange Memory access fault by GPU node-1 errors.

deefster commented 7 months ago

I just wanted to add that I recently experienced many of these issues, NaN errors, image artifacts, etc after upgrading to a newer rocm/torch. I believe it was rocm6 and torch 2.1.. I use docker in Linux, and went back to a 5.4.2 image and now I'm very stable. I was able to upgrade Python to 3.10, and pytorch to 2.0.2 with no issues. Latest webui works fine. I haven't tried the 2.1. torch yet. 6800XT

AUTOMATIC1111 / stable-diffusion-webui