crystian / ComfyUI-Crystools

A powerful set of tools for ComfyUI
MIT License
697 stars 33 forks source link

AMD gpu issues / on both main and AMD branch #111

Closed maurus56 closed 2 weeks ago

maurus56 commented 2 weeks ago

Describe the bug
Fail to init gpu packages, am i missing something? It was working perfectly until i tried installing this library and prob messed up the package versions i had

To Reproduce
Please attach a workflow file to make it easier for others to reproduce the error!

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Error in console:

Traceback (most recent call last):
  File "/home/rick/Documents/flux/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 639, in _raw_device_count_amdsmi
    amdsmi.amdsmi_init()
    ^^^^^^
NameError: name 'amdsmi' is not defined

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/rick/Documents/flux/ComfyUI/main.py", line 229, in <module>
    cuda_malloc_warning()
  File "/home/rick/Documents/flux/ComfyUI/main.py", line 94, in cuda_malloc_warning
    device_name = comfy.model_management.get_torch_device_name(device)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rick/Documents/flux/ComfyUI/comfy/model_management.py", line 263, in get_torch_device_name
    return "{} {} : {}".format(device, torch.cuda.get_device_name(device), allocator_backend)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rick/Documents/flux/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 435, in get_device_name
    return get_device_properties(device).name
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rick/Documents/flux/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 467, in get_device_properties
    if device < 0 or device >= device_count():
                               ^^^^^^^^^^^^^^
  File "/home/rick/Documents/flux/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 842, in device_count
    nvml_count = _device_count_amdsmi() if torch.version.hip else _device_count_nvml()
                 ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rick/Documents/flux/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 764, in _device_count_amdsmi
    raw_cnt = _raw_device_count_amdsmi()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rick/Documents/flux/venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 640, in _raw_device_count_amdsmi
    except amdsmi.AmdSmiException as e:
           ^^^^^^
NameError: name 'amdsmi' is not defined

Versions:
Copy the output of the console (4 parts), like this:

** Python version: 3.11.9 (main, Nov 10 2011, 15:00:00) [GCC 13.2.0]
Total VRAM 16368 MB, total RAM 31303 MB
pytorch version: 2.4.0+rocm6.1
Set vram state to: LOW_VRAM
[Crystools INFO] Crystools version: 1.16.6
[Crystools INFO] CPU: AMD Ryzen 7 7700X 8-Core Processor - Arch: x86_64 - OS: Linux 6.9.3-76060903-generic
[Crystools ERROR] Could not init pynvml (Nvidia).NVML Shared Library Not Found
[Crystools ERROR] Could not pick default device.name 'amdsmi' is not defined
[Crystools WARNING] No GPU with CUDA detected.
[Crystools INFO] Crystools version: 1.16.4
[Crystools INFO] CPU: AMD Ryzen 7 7700X 8-Core Processor - Arch: x86_64 - OS: Linux 6.9.3-76060903-generic
[Crystools ERROR] Could not init pynvml (Nvidia).NVML Shared Library Not Found
cat: /sys/module/amdgpu/initstate: No such file or directory
[Crystools ERROR] Could not init pyrsmi (AMD).ROCm driver initilization failed
[Crystools ERROR] Could not pick default device.name 'amdsmi' is not defined
[Crystools WARNING] No GPU with CUDA detected.
### Loading: ComfyUI-Manager (V2.50.2)
### ComfyUI Revision: 2617 [2ca8f6e2] | Released on '2024-08-26'

Additional context
Add any other context about the problem here.

t3dc commented 2 weeks ago

Hitting this same issue as well.

maurus56 commented 2 weeks ago

Seems like the issue is related to the torch version, just downgrade and it should be fine:

pip install torch==2.3.0+rocm6.0 torchvision==0.18.0+rocm6.0 torchaudio==2.3.0+rocm6.0 --index-url https://download.pytorch.org/whl/rocm6.0