comfyanonymous / ComfyUI

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.
GNU General Public License v3.0
40.48k stars 4.31k forks source link

[AMD ROCm] No reactions for 3 monthes or more #3609

Open mn1ei opened 4 weeks ago

mn1ei commented 4 weeks ago
AMD R4750G APU × 16GB UMA
Artix Linux with kernel 6.9.2
AMD ROCm 6.0.x on both system and PIP
pip install -U --break-system-packages --user --extra-index-url https://download.pytorch.org/whl/rocm6.0 \
                                       pip wheel torch torchvision torchaudio safetensors \
                                       diffusers transformers accelerate invisible_watermark \
                                       -r requirements.txt \
                                       -r custom_nodes/AutoCFG/requirements.txt \
                                       -r custom_nodes/ComfyMgt/requirements.txt
HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py \
--gpu-only --force-fp16 --auto-launch --dont-print-server \
--disable-cuda-malloc --preview-method latent2rgb
Exception in thread Thread-1 (<lambda>):
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/aiohttp/connector.py", line 1203, in _create_direct_connection
    hosts = await self._resolve_host(host, port, traces=traces)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/connector.py", line 880, in _resolve_host
    return await asyncio.shield(resolved_host_task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/connector.py", line 917, in _resolve_host_with_throttle
    addrs = await self._resolver.resolve(host, port, family=self._family)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/resolver.py", line 33, in resolve
    infos = await self._loop.getaddrinfo(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 901, in getaddrinfo
    return await self.run_in_executor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/socket.py", line 963, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -3] Temporary failure in name resolution

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/local/.8A/terminal/SDiffXL/ComfyUI/custom_nodes/ComfyMgt/glob/manager_server.py", line 1667, in <lambda>
    threading.Thread(target=lambda: asyncio.run(default_cache_update())).start()
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/local/.8A/terminal/SDiffXL/ComfyUI/custom_nodes/ComfyMgt/glob/manager_server.py", line 1664, in default_cache_update
    await asyncio.gather(a, b, c, d, e)
  File "/opt/local/.8A/terminal/SDiffXL/ComfyUI/custom_nodes/ComfyMgt/glob/manager_server.py", line 1651, in get_cache
    json_obj = await core.get_data(uri, True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/local/.8A/terminal/SDiffXL/ComfyUI/custom_nodes/ComfyMgt/glob/manager_core.py", line 583, in get_data
    async with session.get(uri) as resp:
  File "/usr/lib/python3.12/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/client.py", line 581, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/connector.py", line 544, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/connector.py", line 944, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/aiohttp/connector.py", line 1209, in _create_direct_connection
    raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host raw.githubusercontent.com:443 ssl:default [Temporary failure in name resolution]
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
got prompt
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
ATTENTION: default value of option mesa_glthread overridden by environment.
[GFX1-]: GFX: RenderThread detected a device reset in PostUpdate
ATTENTION: default value of option mesa_glthread overridden by environment.
ltdrdata commented 4 weeks ago

This appears to be a network issue, possibly related to your firewall or ISP. It is not an issue that can be resolved at the ComfyUI or custom nodes level.

All I can suggest is to check your firewall and antivirus software, and try using a VPN.

mn1ei commented 3 weeks ago

Probes of GHub raws on a web browser is ok but why ComfyUI/SDiffXL fails to fetch the same? I also always left firewall and proxy down, and under the same ISP using the same GHub raws is ok outside but dies at ComfyUI/SDiffXL, doesn’t feels weird?

mn1ei commented 3 weeks ago

AMD R4750G APU × (14.9+46.9) GiB DDR4 Samsung 990PRO 3.63TiB nda0 Kingston NV2 3.63TiB nda1 Samsung 860QVO 3.63TiB ada0 (2.5’’) Transcend 220Q 1.82TiB ada1 (M.2 SATA)

The most recent ComfyUI but a rather older SDiffXL 1.0.x base model

ltdrdata commented 3 weeks ago

This issue is not related to the model at all. It is a network-related issue. At some stage, something outside of ComfyUI is interfering with domain name resolving.

doctorpangloss commented 2 weeks ago

try following the installation instructions here: https://github.com/hiddenswitch/ComfyUI?tab=readme-ov-file#installing

mn1ei commented 2 weeks ago

Maybe that network issue was just due to download overloads, however, it crashes either and freeze this turn, samely logs of just download overloads

AngryLoki commented 2 weeks ago

@mn1ei ,

regarding pip install, 1) Python is sensitive to environment variables (while other applications ignore them), check env | grep PROXY 2) If you run it under Docker, it is just Docker-specific, recheck your docker config 3) Check your DNS settings anyways: if Chrome works, it does not mean that you have good system DNS (because Chrome prioritizes 8.8.8.8 over private DNS servers)

Regarding AMD Radeon 4750G: if you are actually trying to use APU (not GPU), your architecture is gfx90c, it differs from gfx1030 in many ways, so don't use HSA_OVERRIDE_GFX_VERSION=10.3.0. Your best chance is HSA_OVERRIDE_GFX_VERSION=9.0.0, some people managed to use it here - https://www.gabriel.urdhr.fr/2022/08/28/trying-to-run-stable-diffusion-on-amd-ryzen-5-5600g/

Regarding 16GB UMA, actually, I don't know if ComfyUI uses UMA memory, but if after fixing all issues you see "not enough memory", you may try latest Linux 6.10 kernel - see https://www.phoronix.com/news/Linux-6.10-AMDKFD-Small-APUs for explanation.

doctorpangloss commented 1 week ago

Maybe that network issue was just due to download overloads, however, it crashes either and freeze this turn, samely logs of just download overloads

are you trying to do this in a container?