AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
142.78k stars 26.92k forks source link

[Bug]: pytorch rocm 6.0 with 7600xt = HSA_STATUS_ERROR_INVALID_ISA #15434

Open neem693 opened 7 months ago

neem693 commented 7 months ago

Checklist

What happened?

I have 7600xt now, and some error coming out. the error occur while start the stalbe diffusion webui 1.8.0 with pytorch 2.1.2 rocm 6.0. it is same with pytorch 2.4.0 rocm 6.0 even pytorch 2.3.0

the error is rocdevice.cpp :2728: 0225198107 us: [pid:4807 tid:0x7d4fc8fff640] Callback: Queue 0x7d4de2e00000 aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f

but it is fine with pytorch 2.2.0 with rocm 5.7

of course, I have setting with export HSA_OVERRIDE_GFX_VERSION=11.0.0

Steps to reproduce the problem

with 7600xt or 7600 install rocm and pytorch with rocm 6.0 at python venv environement.

and run HSA_OVERRIDE_GFX_VERSION=11.0.0 webui.sh

What should have happened?

please support rx 7600xt pytorch rocm 6 version

What browsers do you use to access the UI ?

Mozilla Firefox

Sysinfo

I will upload the sysinfo But now I can upload rocminfo now

=====================

HSA System Attributes

=====================

Runtime Version: 1.1

System Timestamp Freq.: 1000.000000MHz

Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)

Machine Model: LARGE

System Endianness: LITTLE

Mwaitx: DISABLED

DMAbuf Support: YES

==========

HSA Agents

==========


Agent 1


Name: AMD Ryzen 5 5600 6-Core Processor

Uuid: CPU-XX

Marketing Name: AMD Ryzen 5 5600 6-Core Processor

Vendor Name: CPU

Feature: None specified

Profile: FULL_PROFILE

Float Round Mode: NEAR

Max Queue Number: 0(0x0)

Queue Min Size: 0(0x0)

Queue Max Size: 0(0x0)

Queue Type: MULTI

Node: 0

Device Type: CPU

Cache Info:

L1: 32768(0x8000) KB

Chip ID: 0(0x0)

ASIC Revision: 0(0x0)

Cacheline Size: 64(0x40)

Max Clock Freq. (MHz): 3500

BDFID: 0

Internal Node ID: 0

Compute Unit: 12

SIMDs per CU: 0

Shader Engines: 0

Shader Arrs. per Eng.: 0

WatchPts on Addr. Ranges:1

Features: None

Pool Info:

Pool 1

Segment: GLOBAL; FLAGS: FINE GRAINED

Size: 32775240(0x1f41c48) KB

Allocatable: TRUE

Alloc Granule: 4KB

Alloc Alignment: 4KB

Accessible by all: TRUE

Pool 2

Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED

Size: 32775240(0x1f41c48) KB

Allocatable: TRUE

Alloc Granule: 4KB

Alloc Alignment: 4KB

Accessible by all: TRUE

Pool 3

Segment: GLOBAL; FLAGS: COARSE GRAINED

Size: 32775240(0x1f41c48) KB

Allocatable: TRUE

Alloc Granule: 4KB

Alloc Alignment: 4KB

Accessible by all: TRUE

ISA Info:


Agent 2


Name: gfx1102

Uuid: GPU-XX

Marketing Name: AMD Radeon™ RX 7600 XT

Vendor Name: AMD

Feature: KERNEL_DISPATCH

Profile: BASE_PROFILE

Float Round Mode: NEAR

Max Queue Number: 128(0x80)

Queue Min Size: 64(0x40)

Queue Max Size: 131072(0x20000)

Queue Type: MULTI

Node: 1

Device Type: GPU

Cache Info:

L1: 32(0x20) KB

L2: 2048(0x800) KB

Chip ID: 29824(0x7480)

ASIC Revision: 0(0x0)

Cacheline Size: 64(0x40)

Max Clock Freq. (MHz): 2539

BDFID: 10240

Internal Node ID: 1

Compute Unit: 32

SIMDs per CU: 2

Shader Engines: 2

Shader Arrs. per Eng.: 2

WatchPts on Addr. Ranges:4

Coherent Host Access: FALSE

Features: KERNEL_DISPATCH

Fast F16 Operation: TRUE

Wavefront Size: 32(0x20)

Workgroup Max Size: 1024(0x400)

Workgroup Max Size per Dimension:

x 1024(0x400)

y 1024(0x400)

z 1024(0x400)

Max Waves Per CU: 32(0x20)

Max Work-item Per CU: 1024(0x400)

Grid Max Size: 4294967295(0xffffffff)

Grid Max Size per Dimension:

x 4294967295(0xffffffff)

y 4294967295(0xffffffff)

z 4294967295(0xffffffff)

Max fbarriers/Workgrp: 32

Packet Processor uCode:: 52

SDMA engine uCode:: 16

IOMMU Support:: None

Pool Info:

Pool 1

Segment: GLOBAL; FLAGS: COARSE GRAINED

Size: 16760832(0xffc000) KB

Allocatable: TRUE

Alloc Granule: 4KB

Alloc Alignment: 4KB

Accessible by all: FALSE

Pool 2

Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED

Size: 16760832(0xffc000) KB

Allocatable: TRUE

Alloc Granule: 4KB

Alloc Alignment: 4KB

Accessible by all: FALSE

Pool 3

Segment: GROUP

Size: 64(0x40) KB

Allocatable: FALSE

Alloc Granule: 0KB

Alloc Alignment: 0KB

Accessible by all: FALSE

ISA Info:

ISA 1

Name: amdgcn-amd-amdhsa--gfx1102

Machine Models: HSA_MACHINE_MODEL_LARGE

Profiles: HSA_PROFILE_BASE

Default Rounding Mode: NEAR

Default Rounding Mode: NEAR

Fast f16: TRUE

Workgroup Max Size: 1024(0x400)

Workgroup Max Size per Dimension:

x 1024(0x400)

y 1024(0x400)

z 1024(0x400)

Grid Max Size: 4294967295(0xffffffff)

Grid Max Size per Dimension:

x 4294967295(0xffffffff)

y 4294967295(0xffffffff)

z 4294967295(0xffffffff)

FBarrier Max Size: 32

Done

Console logs

no module 'xformers'. Processing without...

no module 'xformers'. Processing without...

No module 'xformers'. Proceeding without it.

Loading weights [6ce0161689] from /stable_diffusion/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors

Running on local URL: http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.

Creating model from config: /stable_diffusion/stable-diffusion-webui/configs/v1-inference.yaml

Startup time: 11.0s (prepare environment: 3.4s, import torch: 4.0s, import gradio: 0.8s, setup paths: 1.1s, other imports: 0.5s, load scripts: 0.2s, create ui: 0.4s, gradio launch: 0.6s).

Applying attention optimization: Doggettx... done.

Model loaded in 13.5s (load weights from disk: 0.8s, create model: 0.3s, apply weights to model: 11.8s, calculate empty prompt: 0.4s).

:0:rocdevice.cpp :2728: 0960471930 us: [pid:7379 tid:0x7932b7dff640] Callback: Queue 0x793124600000 aborting with error : HSA_STATUS_ERROR_INVALID_ISA: The instruction set architecture is invalid. code: 0x100f

./webui.sh: 292: 7379 (core dumped) "${python_cmd}" -u "${LAUNCH_SCRIPT}" "$@"

Additional information

No response

derzahla commented 7 months ago

Same for me with an 8600G I will try 5.7 though and see it that works. I went straight to rocm6 like I am using for ollama and localAI

lovenemesis commented 6 months ago

Same for me with 780M on Fedora 40 with PyTorch 2.3.0 + ROCm 6.0.

On Fedora 39 with PyTorch 2.2.1 + ROCm 5.7, it works perfectly well.

Really consider this a regression.

SuperGoodGame commented 6 months ago

do you have the method to solve this problem,i have the same situation and dont know how to deal with it

lovenemesis commented 5 months ago

Honestly no solution unless AMD fixes its own ROCm part for unlisted GPUs.

The exactly same setup works pretty well with RX7800 XT.

remixmabix commented 4 months ago

+1