ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
141 stars 46 forks source link

[Issue]: is scaled_dot_product_attention part of flash attention? #79

Open unclemusclez opened 2 months ago

unclemusclez commented 2 months ago

Problem Description

I get these errors often from various applications, this one if from ComfyUI.

Is scaled_dot_product_attention part of flash attention? I am using howiejay/navi_support which enables 7900XT gfx1100 flash attention support on ROCm devices.

Downloading model to: /home/musclez/ComfyUI/models/CogVideo/CogVideo2B
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:40<00:00,  3.66s/it]
latents.shape torch.Size([1, 13, 16, 60, 90])
latents.device cuda:0
  0%|                                                                                                                                                                                                                 | 0/50 [00:00<?, ?it/s]
!!! Exception during processing !!! HIP out of memory. Tried to allocate 35.31 GiB. GPU 0 has a total capacity of 19.94 GiB of which 13.47 GiB is free. Of the allocated memory 5.17 GiB is allocated by PyTorch, and 259.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Traceback (most recent call last):
  File "/home/musclez/ComfyUI/execution.py", line 317, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/execution.py", line 192, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/home/musclez/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/nodes.py", line 314, in process
    latents = pipeline["pipe"](
              ^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper/pipeline_cogvideox.py", line 584, in __call__
    noise_pred = self.transformer(
                 ^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 458, in forward
    hidden_states, encoder_hidden_states = block(
                                           ^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/diffusers/models/transformers/cogvideox_transformer_3d.py", line 131, in forward
    attn_hidden_states, attn_encoder_hidden_states = self.attn1(
                                                     ^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 490, in forward
    return self.processor(
           ^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/diffusers/models/attention_processor.py", line 1934, in __call__
    hidden_states = F.scaled_dot_product_attention(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
torch.OutOfMemoryError: HIP out of memory. Tried to allocate 35.31 GiB. GPU 0 has a total capacity of 19.94 GiB of which 13.47 GiB is free. Of the allocated memory 5.17 GiB is allocated by PyTorch, and 259.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Got an OOM, unloading all loaded models.
Prompt executed in 70.36 seconds

Operating System

WSL2 Ubuntu 22.04 Windows 11

CPU

7800x3D

GPU

AMD Radeon RX 7900 XT

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

evshiron commented 2 months ago

SDPA implementations are managed by PyTorch itself and will not automatically use Flash Attention implementation from external libraries for its computation. You can monkey patch like this to override SDPA if needed:

howiejay/navi_support only implements the forward pass, thus you can't use it for training.

If you are interested, you can try the branch in this PR, which updates AOTriton and supports Navi31:

unclemusclez commented 2 months ago

SDPA implementations are managed by PyTorch itself and will not automatically use Flash Attention implementation from external libraries for its computation. You can monkey patch like this to override SDPA if needed:

* https://github.com/vladmandic/automatic/blob/master/modules/devices.py#L260

howiejay/navi_support only implements the forward pass, thus you can't use it for training.

If you are interested, you can try the branch in this PR, which updates AOTriton and supports Navi31:

* [[ROCm] Update to AOTriton 0.7b pytorch/pytorch#134498](https://github.com/pytorch/pytorch/pull/134498)

is this going to compile on wsl2 linux? i have been banging my head against my desk trying to get this to compile for the past month. there have been multiple issues, but AOTriton was definitely one of them.

It would be really nice if the AMD repo had nightly whls for python 3.10 as well as python3.11 and python3.12 including:

evshiron commented 2 months ago

@unclemusclez

Almost everything can be compiled in ROCm in WSL.

For PyTorch, you can follow the steps here:

For TorchVision and TorchAudio, they can be easily compiled once corresponding PyTorch is installed.

For Flash Attention, the branch in https://github.com/ROCm/flash-attention/pull/76 is still in early stage. You can make it work with some fixes and understandings, but I haven't been able to get extraordinary performance from it.

For xFormers, it should only work on CDNA GPUs at the moment and I don't know any repo that work for Navi3x.

For BitsAndBytes, the official repo has a branch https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor, you can build it there and it will work just fine (only 4bit?).

I have been training some LLMs using https://github.com/hiyouga/LLaMA-Factory on my RX 7900 XTX these weeks, with vanilla PyTorch and custom BitsAndBytes in WSL. While it works for single GPU training, the training performance is about 50% of a RTX 4090 D.

unclemusclez commented 2 months ago

@unclemusclez

Almost everything can be compiled in ROCm in WSL.

For PyTorch, you can follow the steps here:

* [Improve Backward Performance and Navi31 Support aotriton#39 (comment)](https://github.com/ROCm/aotriton/pull/39#issuecomment-2291683442)

* Switch to that branch in step 1

* Skip step 2

For TorchVision and TorchAudio, they can be easily compiled once corresponding PyTorch is installed.

For Flash Attention, the branch in #76 is still in early stage. You can make it work with some fixes and understandings, but I haven't been able to get extraordinary performance from it.

For xFormers, it should only work on CDNA GPUs at the moment and I don't know any repo that work for Navi3x.

For BitsAndBytes, the official repo has a branch https://github.com/bitsandbytes-foundation/bitsandbytes/tree/multi-backend-refactor, you can build it there and it will work just fine (only 4bit?).

I have been training some LLMs using https://github.com/hiyouga/LLaMA-Factory on my RX 7900 XTX these weeks, with vanilla PyTorch and custom BitsAndBytes in WSL. While it works for single GPU training, the training performance is about 50% of a RTX 4090 D.

i'm having toruble setting AOTriton to just build the '--target_gpus' 'Navi31'

evshiron commented 2 months ago

@unclemusclez

i'm having toruble setting AOTriton to just build the '--target_gpus' 'Navi31'

Add a new line of -DTARGET_GPUS=Navi31 here:

unclemusclez commented 2 months ago

alrightty i was trying to export the TARGET_GPUS variable but it was not working. i will try this instead.

@unclemusclez

i'm having toruble setting AOTriton to just build the '--target_gpus' 'Navi31'

Add a new line of -DTARGET_GPUS=Navi31 here:

* https://github.com/ROCm/pytorch/blob/xinyazhang/tensor_philox/cmake/External/aotriton.cmake#L22
unclemusclez commented 2 months ago
 ./startup.sh
[START] Security scan
[DONE] Security scan
## ComfyUI-Manager: installing dependencies done.
** ComfyUI startup time: 2024-09-02 15:55:53.891705
** Platform: Linux
** Python version: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0]
** Python executable: /home/musclez/ComfyUI/.venv/bin/python
** ComfyUI Path: /home/musclez/ComfyUI
** Log path: /home/musclez/ComfyUI/comfyui.log

Prestartup times for custom nodes:
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/rgthree-comfy
   1.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Manager

/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:703: UserWarning: Can't initialize amdsmi - Error code: 34
  warnings.warn(f"Can't initialize amdsmi - Error code: {e.err_code}")
Traceback (most recent call last):
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 332, in _lazy_init
    queued_call()
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 205, in _check_capability
    min_arch = min(
               ^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 206, in <genexpr>
    (_extract_arch_version(arch) for arch in torch.cuda.get_arch_list()),
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 177, in _extract_arch_version
    base = arch_string.split("_")[1]
           ~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/musclez/ComfyUI/main.py", line 90, in <module>
    import execution
  File "/home/musclez/ComfyUI/execution.py", line 13, in <module>
    import nodes
  File "/home/musclez/ComfyUI/nodes.py", line 21, in <module>
    import comfy.diffusers_load
  File "/home/musclez/ComfyUI/comfy/diffusers_load.py", line 3, in <module>
    import comfy.sd
  File "/home/musclez/ComfyUI/comfy/sd.py", line 5, in <module>
    from comfy import model_management
  File "/home/musclez/ComfyUI/comfy/model_management.py", line 143, in <module>
    total_vram = get_total_memory(get_torch_device()) / (1024 * 1024)
                                  ^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/comfy/model_management.py", line 112, in get_torch_device
    return torch.device(torch.cuda.current_device())
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 940, in current_device
    _lazy_init()
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 338, in _lazy_init
    raise DeferredCudaCallError(msg) from e
torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization with error: list index out of range

CUDA call was originally invoked at:

  File "/home/musclez/ComfyUI/main.py", line 87, in <module>
    import comfy.utils
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/musclez/ComfyUI/comfy/utils.py", line 20, in <module>
    import torch
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/__init__.py", line 1955, in <module>
    _C._initExtension(_manager_path())
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 264, in <module>
    _lazy_call(_check_capability)
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 261, in _lazy_call
    _queued_calls.append((callable, traceback.format_stack()))

seemingly incompatible with WSL2's lack of amdsmi.. not sure it could be something else.

unclemusclez commented 2 months ago

@evshiron are you able to identify what i should look into for this issue?


/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py:703: UserWarning: Can't initialize amdsmi - Error code: 34
  warnings.warn(f"Can't initialize amdsmi - Error code: {e.err_code}")
Traceback (most recent call last):
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 332, in _lazy_init
    queued_call()
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 205, in _check_capability
    min_arch = min(
               ^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 206, in <genexpr>
    (_extract_arch_version(arch) for arch in torch.cuda.get_arch_list()),
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/cuda/__init__.py", line 177, in _extract_arch_version
    base = arch_string.split("_")[1]
           ~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range```
evshiron commented 2 months ago

@unclemusclez

are you able to identify what i should look into for this issue?

Try uninstalling amdsmi in your venv:

pip3 uninstall amdsmi

See also:

unclemusclez commented 2 months ago

@unclemusclez

are you able to identify what i should look into for this issue?

Try uninstalling amdsmi in your venv:

pip3 uninstall amdsmi

See also:

* [Having `pynvml` installed and `cuda.hip == True` will crash on use of `amdsmi` which might not exist pytorch/pytorch#133259](https://github.com/pytorch/pytorch/issues/133259)

that worked, and i recompiled pytorch with some of your settings that you linked.

pytorch compiled, which is nice, but it's not working with anything.

Traceback (most recent call last):
  File "/home/musclez/ComfyUI/.venv/bin/accelerate", line 5, in <module>
    from accelerate.commands.accelerate_cli import main
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 19, in <module>
    from accelerate.commands.estimate import estimate_command_parser
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/accelerate/commands/estimate.py", line 34, in <module>
    import timm
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/timm/__init__.py", line 2, in <module>
    from .layers import is_scriptable, is_exportable, set_scriptable, set_exportable
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/timm/layers/__init__.py", line 8, in <module>
    from .classifier import ClassifierHead, create_classifier, NormMlpClassifierHead
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/timm/layers/classifier.py", line 15, in <module>
    from .create_norm import get_norm_layer
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/timm/layers/create_norm.py", line 14, in <module>
    from torchvision.ops.misc import FrozenBatchNorm2d
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torchvision/__init__.py", line 10, in <module>
    from torchvision import _meta_registrations, datasets, io, models, ops, transforms, utils  # usort:skip
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torchvision/_meta_registrations.py", line 163, in <module>
    @torch.library.register_fake("torchvision::nms")
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/library.py", line 795, in register
    use_lib._register_fake(op_name, func, _stacklevel=stacklevel + 1)
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/library.py", line 184, in _register_fake
    handle = entry.fake_impl.register(func_to_register, source)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/_library/fake_impl.py", line 31, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

i cant compile torchvision, torchaudio, and like you mentioned, xformers is also not compiling.

evshiron commented 2 months ago

@unclemusclez

What's the exception in that log?

If the compilation of TorchVision or TorchAudio doesn't work, you can put their logs here. I can't give advice without a detailed context.

unclemusclez commented 2 months ago

@unclemusclez

What's the exception in that log?

If the compilation of TorchVision or TorchAudio doesn't work, you can put their logs here. I can't give advice without a detailed context.

      [1/1] c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/ops/cpu/interpolate_aa_kernels.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -I/home/musclez/rocm/vision/torchvision/csrc -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/ops/cpu/interpolate_aa_kernels.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
      FAILED: /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/ops/cpu/interpolate_aa_kernels.o
      c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/ops/cpu/interpolate_aa_kernels.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -I/home/musclez/rocm/vision/torchvision/csrc -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/ops/cpu/interpolate_aa_kernels.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
      In file included from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec256/vec256.h:8,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec.h:6,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/native/UpSample.h:9,
                       from /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:4:
      /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1117: warning: ignoring ‘#pragma unroll ’ [-Wunknown-pragmas]
       1117 | # pragma unroll
            |
      In file included from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1152,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec256/vec256.h:8,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec.h:6,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/native/UpSample.h:9,
                       from /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:4:
      /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec_n.h:59: warning: ignoring ‘#pragma unroll ’ [-Wunknown-pragmas]
         59 | #pragma unroll
            |
      /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec_n.h:72: warning: ignoring ‘#pragma unroll ’ [-Wunknown-pragmas]
         72 | #pragma unroll
            |
      In file included from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec_base.h:1153,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec256/vec256.h:8,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec.h:6,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/functional_base.h:6,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/functional.h:3,
                       from /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/native/UpSample.h:9,
                       from /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:4:
      /home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/ATen/cpu/vec/vec_mask.h:131: warning: ignoring ‘#pragma unroll ’ [-Wunknown-pragmas]
        131 | #pragma unroll
            |
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp: In lambda function:
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:374:24: error: ‘scalar_t’ was not declared in this scope; did you mean ‘scale_t’?
        374 |             F<index_t, scalar_t>::compute_indices_weights(
            |                        ^~~~~~~~
            |                        scale_t
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:374:32: error: template argument 2 is invalid
        374 |             F<index_t, scalar_t>::compute_indices_weights(
            |                                ^
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp: In lambda function:
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:403:34: error: ‘scalar_t’ was not declared in this scope; did you mean ‘scale_t’?
        403 |       ti_cpu_upsample_generic_aa<scalar_t, index_t, out_ndims>(
            |                                  ^~~~~~~~
            |                                  scale_t
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp: In lambda function:
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:409:38: error: ‘scalar_t’ was not declared in this scope; did you mean ‘scale_t’?
        409 |           ti_cpu_upsample_generic_aa<scalar_t, index_t, out_ndims>(
            |                                      ^~~~~~~~
            |                                      scale_t
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp: In instantiation of ‘void at::native::internal_upsample::_ti_separable_upsample_generic_Nd_kernel_impl_single_dim(at::Tensor&, const at::Tensor&, int, bool, const scale_type&, bool) [with index_t = long int; int out_ndims = 2; scale_type = std::vector<std::optional<double> >; F = at::native::internal_upsample::HelperInterpLinear]’:
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:437:11:   required from ‘void at::native::internal_upsample::ti_separable_upsample_generic_Nd_kernel_impl(at::Tensor&, const at::Tensor&, bool, const scale_type&, bool) [with index_t = long int; int out_ndims = 2; scale_type = std::vector<std::optional<double> >; F = at::native::internal_upsample::HelperInterpLinear]’
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:459:26:   required from here
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:368:33: error: ‘AT_DISPATCH_FLOATING_TYPES_AND’ was not declared in this scope
        368 |   AT_DISPATCH_FLOATING_TYPES_AND(
            |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        369 |       at::ScalarType::Byte,
            |       ~~~~~~~~~~~~~~~~~~~~~
        370 |       input_scalar_type,
            |       ~~~~~~~~~~~~~~~~~~
        371 |       "compute_indices_weights_generic",
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        372 |       [&] {
            |       ~~~~~
        373 |         indices_weights.emplace_back(
            |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        374 |             F<index_t, scalar_t>::compute_indices_weights(
            |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        375 |                 input.size(interp_dim),
            |                 ~~~~~~~~~~~~~~~~~~~~~~~
        376 |                 oshape[interp_dim],
            |                 ~~~~~~~~~~~~~~~~~~~
        377 |                 input.stride(interp_dim) * input.element_size(),
            |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        378 |                 input.dim(),
            |                 ~~~~~~~~~~~~
        379 |                 interp_dim,
            |                 ~~~~~~~~~~~
        380 |                 align_corners,
            |                 ~~~~~~~~~~~~~~
        381 |                 scales[interp_dim - 2],
            |                 ~~~~~~~~~~~~~~~~~~~~~~~
        382 |                 antialias,
            |                 ~~~~~~~~~~
        383 |                 interp_size));
            |                 ~~~~~~~~~~~~~~
        384 |       });
            |       ~~
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:402:31: error: ‘AT_DISPATCH_FLOATING_TYPES’ was not declared in this scope
        402 |     AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "upsample_generic_Nd", [&] {
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        403 |       ti_cpu_upsample_generic_aa<scalar_t, index_t, out_ndims>(
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        404 |           iter, interp_size);
            |           ~~~~~~~~~~~~~~~~~~~
        405 |     });
            |     ~~
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:407:35: error: ‘AT_DISPATCH_FLOATING_TYPES_AND’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
        407 |     AT_DISPATCH_FLOATING_TYPES_AND(
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        408 |         at::ScalarType::Byte, iter.dtype(), "upsample_generic_Nd", [&] {
            |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        409 |           ti_cpu_upsample_generic_aa<scalar_t, index_t, out_ndims>(
            |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        410 |               iter, interp_size);
            |               ~~~~~~~~~~~~~~~~~~~
        411 |         });
            |         ~~
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp: In instantiation of ‘void at::native::internal_upsample::_ti_separable_upsample_generic_Nd_kernel_impl_single_dim(at::Tensor&, const at::Tensor&, int, bool, const scale_type&, bool) [with index_t = long int; int out_ndims = 2; scale_type = std::vector<std::optional<double> >; F = at::native::internal_upsample::HelperInterpCubic]’:
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:437:11:   required from ‘void at::native::internal_upsample::ti_separable_upsample_generic_Nd_kernel_impl(at::Tensor&, const at::Tensor&, bool, const scale_type&, bool) [with index_t = long int; int out_ndims = 2; scale_type = std::vector<std::optional<double> >; F = at::native::internal_upsample::HelperInterpCubic]’
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:474:25:   required from here
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:368:33: error: ‘AT_DISPATCH_FLOATING_TYPES_AND’ was not declared in this scope
        368 |   AT_DISPATCH_FLOATING_TYPES_AND(
            |   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        369 |       at::ScalarType::Byte,
            |       ~~~~~~~~~~~~~~~~~~~~~
        370 |       input_scalar_type,
            |       ~~~~~~~~~~~~~~~~~~
        371 |       "compute_indices_weights_generic",
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        372 |       [&] {
            |       ~~~~~
        373 |         indices_weights.emplace_back(
            |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        374 |             F<index_t, scalar_t>::compute_indices_weights(
            |             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        375 |                 input.size(interp_dim),
            |                 ~~~~~~~~~~~~~~~~~~~~~~~
        376 |                 oshape[interp_dim],
            |                 ~~~~~~~~~~~~~~~~~~~
        377 |                 input.stride(interp_dim) * input.element_size(),
            |                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        378 |                 input.dim(),
            |                 ~~~~~~~~~~~~
        379 |                 interp_dim,
            |                 ~~~~~~~~~~~
        380 |                 align_corners,
            |                 ~~~~~~~~~~~~~~
        381 |                 scales[interp_dim - 2],
            |                 ~~~~~~~~~~~~~~~~~~~~~~~
        382 |                 antialias,
            |                 ~~~~~~~~~~
        383 |                 interp_size));
            |                 ~~~~~~~~~~~~~~
        384 |       });
            |       ~~
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:402:31: error: ‘AT_DISPATCH_FLOATING_TYPES’ was not declared in this scope
        402 |     AT_DISPATCH_FLOATING_TYPES(iter.dtype(), "upsample_generic_Nd", [&] {
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        403 |       ti_cpu_upsample_generic_aa<scalar_t, index_t, out_ndims>(
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        404 |           iter, interp_size);
            |           ~~~~~~~~~~~~~~~~~~~
        405 |     });
            |     ~~
      /home/musclez/rocm/vision/torchvision/csrc/ops/cpu/interpolate_aa_kernels.cpp:407:35: error: ‘AT_DISPATCH_FLOATING_TYPES_AND’ was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation [-fpermissive]
        407 |     AT_DISPATCH_FLOATING_TYPES_AND(
            |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        408 |         at::ScalarType::Byte, iter.dtype(), "upsample_generic_Nd", [&] {
            |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        409 |           ti_cpu_upsample_generic_aa<scalar_t, index_t, out_ndims>(
            |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        410 |               iter, interp_size);
            |               ~~~~~~~~~~~~~~~~~~~
        411 |         });
            |         ~~
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build
          subprocess.run(
        File "/usr/lib/python3.11/subprocess.py", line 569, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/home/musclez/rocm/vision/setup.py", line 466, in <module>
          setup(
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/__init__.py", line 104, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
          dist.run_commands()
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/dist.py", line 967, in run_command
          super().run_command(command)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 91, in run
          _build_ext.run(self)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 359, in run
          self.build_extensions()
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 868, in build_extensions
          build_ext.build_extensions(self)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 479, in build_extensions
          self._build_extensions_serial()
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 505, in _build_extensions_serial
          self.build_extension(ext)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/command/build_ext.py", line 252, in build_extension
          _build_ext.build_extension(self, ext)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
          super(build_ext, self).build_extension(ext)
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/setuptools/_distutils/command/build_ext.py", line 560, in build_extension
          objects = self.compiler.compile(
                    ^^^^^^^^^^^^^^^^^^^^^^
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 681, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 1784, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed cleaning build dir for torchvision
Failed to build torchvision
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (torchvision)

very close with torchvision. This last error i do not know how to fix.

WITH_CUDA=1 USE_ROCM=1 PYTORCH_ROCM_ARCH=gfx1100 pip install --global-option=build_ext --global-option="-L/usr/local/cuda-12.1/include:/home/musclez/pytorch/aten/src/ATen" .

i also had to change the name of ATen/native/quantized/AffineQuantizer.h in some of the files like: in: torchvision/csrc/ops/quantized/cpu/qnms_kernel.cpp

@evshiron i really appreaciate the help it's been so dificult to navigate how to do this.

evshiron commented 2 months ago

Steps to build PyTorch, TorchVision and TorchAudio in WSL with ROCm integration:

# in a new/clean shell session without PATH interference.

# build torch

git clone https://github.com/pytorch/pytorch
cd pytorch

git fetch origin pull/134498/head:pull/134498
git checkout pull/134498

python3 -m venv venv
source venv/bin/activate

pip3 install -r requirements.txt
pip3 install numpy==1.26.4 wheel

export PYTORCH_ROCM_ARCH=gfx1100
export AMDGPU_TARGETS=gfx1100
export HCC_AMDGPU_TARGET=gfx1100
export MAX_JOBS=8

echo 2.5.0 > version.txt
# add -DTARGET_GPUS=Navi31 option
vi cmake/External/aotriton.cmake

python3 tools/amd_build/build_amd.py
python3 setup.py bdist_wheel

# install torch
pip3 install dist/torch*.whl

# build torchvision

git clone https://github.com/pytorch/vision
cd vision

echo 0.20.0 > version.txt

python3 setup.py bdist_wheel

# install torchvision
pip3 install dist/torchvision*.whl

cd ..

# build torchaudio

git clone https://github.com/pytorch/audio
cd audio

echo 2.5.0 > version.txt

python3 setup.py bdist_wheel

# install torchaudio
pip3 install dist/torchaudio*.whl

You can now install those .whl files into the venv of your application, and enjoy.

EDIT: this environment variable is needed to enable AOTriton for Navi31 before running your application:

export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
unclemusclez commented 2 months ago
python3 setup.py bdist_wheel

@evshiron this was the original issue i had when trying to compile vision. the previous error i showed you was for the rocm repo of vision. torch audio did work

Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/encode_jpegs_hip.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -DPNG_FOUND=1 -DJPEG_FOUND=1 -DNVJPEG_FOUND=1 -I/usr/include/libpng16 -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/encode_jpegs_hip.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/encode_jpegs_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=image -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
FAILED: /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/encode_jpegs_hip.o
c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/encode_jpegs_hip.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -DPNG_FOUND=1 -DJPEG_FOUND=1 -DNVJPEG_FOUND=1 -I/usr/include/libpng16 -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/encode_jpegs_hip.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/encode_jpegs_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=image -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
In file included from /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/encode_jpegs_hip.cpp:2:
/home/musclez/rocm/vision/torchvision/csrc/io/image/hip/../hip/encode_jpegs_hip.h:9:10: fatal error: nvjpeg.h: No such file or directory
    9 | #include <nvjpeg.h>
      |          ^~~~~~~~~~
compilation terminated.
[2/3] c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/decode_jpegs_hip.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -DPNG_FOUND=1 -DJPEG_FOUND=1 -DNVJPEG_FOUND=1 -I/usr/include/libpng16 -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/decode_jpegs_hip.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/decode_jpegs_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=image -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
FAILED: /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/decode_jpegs_hip.o
c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/decode_jpegs_hip.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -DPNG_FOUND=1 -DJPEG_FOUND=1 -DNVJPEG_FOUND=1 -I/usr/include/libpng16 -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/decode_jpegs_hip.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/hip/decode_jpegs_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=image -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
In file included from /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/decode_jpegs_hip.cpp:2:
/home/musclez/rocm/vision/torchvision/csrc/io/image/hip/../hip/decode_jpegs_hip.h:9:10: fatal error: nvjpeg.h: No such file or directory
    9 | #include <nvjpeg.h>
      |          ^~~~~~~~~~
compilation terminated.
[3/3] c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/image_hip.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -DPNG_FOUND=1 -DJPEG_FOUND=1 -DNVJPEG_FOUND=1 -I/usr/include/libpng16 -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/io/image/image_hip.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/image_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=image -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
FAILED: /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/image_hip.o
c++ -MMD -MF /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/image_hip.o.d -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -DWITH_HIP -DPNG_FOUND=1 -DJPEG_FOUND=1 -DNVJPEG_FOUND=1 -I/usr/include/libpng16 -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/TH -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THC -I/home/musclez/ComfyUI/.venv/lib/python3.11/site-packages/torch/include/THH -I/opt/rocm-6.1.3/include -I/home/musclez/ComfyUI/.venv/include -I/usr/include/python3.11 -c -c /home/musclez/rocm/vision/torchvision/csrc/io/image/image_hip.cpp -o /home/musclez/rocm/vision/build/temp.linux-x86_64-cpython-311/torchvision/csrc/io/image/image_hip.o -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -fPIC -D__HIP_PLATFORM_AMD__=1 -DUSE_ROCM=1 -DHIPBLAS_V2 -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=image -D_GLIBCXX_USE_CXX11_ABI=1 -std=c++17
In file included from /home/musclez/rocm/vision/torchvision/csrc/io/image/hip/encode_decode_jpegs_hip.h:6,
                 from /home/musclez/rocm/vision/torchvision/csrc/io/image/image_hip.h:14,
                 from /home/musclez/rocm/vision/torchvision/csrc/io/image/image_hip.cpp:2:
/home/musclez/rocm/vision/torchvision/csrc/io/image/hip/../hip/decode_jpegs_hip.h:9:10: fatal error: nvjpeg.h: No such file or directory
    9 | #include <nvjpeg.h>
      |          ^~~~~~~~~~
compilation terminated.
evshiron commented 2 months ago

nvJPEG should not be used unless you have CUDA toolkit installed, and you can unset CUDA_HOME or export TORCHVISION_USE_NVJPEG=0 to work around it:

https://github.com/ROCm/vision is really old and you should not use it.

unclemusclez commented 2 months ago
TORCHVISION_USE_NVJPEG=0 HCC_AMDGPU_TARGET=gfx1100 WITH_CUDA=1 USE_ROCM=1 PYTORCH_ROCM_ARCH=gfx1100  python setup.py bdist_wheel

worked! you are a god

now i just have to try xformers

testing the env:

Import times for custom nodes:
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Image-Selector
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SD3-Powerlab
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Thumbnails
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Noise
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/SD3-Scaling
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-ollama
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-HQ-Image-Save
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SAI_API
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_3dPoseEditor
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/sd-dynamic-thresholding
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AutoTrimBG
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SD3LatentSelectRes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-selector
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/stability-ComfyUI-nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Cutoff
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_ADV_CLIP_emb
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SD3-nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-ollama-prompt-encode
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_JPS-Nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Flowty-TripoSR
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Video-Matting
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-layerdiffuse
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/RES4LYF
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_essentials
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Custom-Scripts
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-browser
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/steerable-motion
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-DynamiCrafterWrapper
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-portrait-master
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-LuminaWrapper
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Flowty-CRM
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-KJNodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-GGUF
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui_bmad_nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-sound-lab
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-segment-anything-2
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-dream-video-batches
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-TiledDiffusion
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AnimateAnyone-Evolved
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-CCSR
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-0246
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfy-image-saver
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/rgthree-comfy
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Keyframed
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SUPIR
   0.0 seconds (IMPORT FAILED): /home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Comfyroll_CustomNodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-PhotoMaker-ZHO
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/facerestore_cf
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Impact-Pack
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AudioReactor
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/SeargeSDXL
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-PixelArt-Detector
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/StableZero123-comfyui
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Manager
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_StableAudio_Open
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_InstantID
   0.6 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Jags_VectorMagic
   0.7 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-TTools
   0.7 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-DeepFuze
   0.7 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Primere_Nodes
   0.8 seconds: /home/musclez/ComfyUI/custom_nodes/anynode
   1.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-APISR-KJ
   1.1 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUi-Ollama-YN
   1.3 seconds: /home/musclez/ComfyUI/custom_nodes/was-node-suite-comfyui
   1.6 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-mixlab-nodes
   2.9 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AudioReactive
   3.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-StableAudioSampler
   3.7 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_OpenVoice
   4.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Jags_Audiotools
   7.3 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-MotionDiff
Searge-SDXL v4.3.1 in /home/musclez/ComfyUI/custom_nodes/SeargeSDXL
[ReActor] - STATUS - Running v0.5.1-a6 in ComfyUI
Traceback (most recent call last):
  File "/home/musclez/ComfyUI/nodes.py", line 1993, in load_custom_node
    module_spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/__init__.py", line 23, in <module>
    from .nodes import NODE_CLASS_MAPPINGS, NODE_DISPLAY_NAME_MAPPINGS
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/nodes.py", line 27, in <module>
    from scripts.reactor_faceswap import (
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/scripts/reactor_faceswap.py", line 14, in <module>
    from scripts.reactor_swapper import (
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/scripts/reactor_swapper.py", line 25, in <module>
    from scripts.r_faceboost import swapper, restorer
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/scripts/r_faceboost/restorer.py", line 17, in <module>
    from r_basicsr.utils.registry import ARCH_REGISTRY
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/r_basicsr/__init__.py", line 3, in <module>
    from .archs import *
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/r_basicsr/archs/__init__.py", line 5, in <module>
    from r_basicsr.utils import get_root_logger, scandir
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/r_basicsr/utils/__init__.py", line 2, in <module>
    from .diffjpeg import DiffJPEG
  File "/home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node/r_basicsr/utils/diffjpeg.py", line 19, in <module>
    y_table = nn.Parameter(torch.from_numpy(y_table))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: PyTorch was compiled without NumPy support

I guess i need to rebuild? I'm also unsure of flash-attn? do i reinstall that,

evshiron commented 2 months ago

xFormers will not work on Navi3x.

Only numpy<2 can be used in WSL with ROCm integration. You can try pip3 install 'numpy<2' or pip3 install numpy==1.26.4 and see if it's fixed.

If not, you have to install numpy before rebuilding PyTorch, which is described in the steps above.

There are various Flash Attention implementations in ROCm ecosystem, but few of them have superior performance.

This branch has a battle tested CK-based implementation that works with Navi3x, but only the forward pass is implemented, which means you can't use it for training:

Here is how to use it for better performance:

All other Triton-based implementations (including AOTriton) aren't going to perform better than the CK one for Navi3x, but some of them implement the backward pass. They do save a lot of VRAM, but may not even perform better than the Math implementation (the fallback one in PyTorch) in some cases.

There is even a rocWMMA-based implementation by an unofficial developer. I haven't tried it, but if you are interested, follow the thread:

unclemusclez commented 2 months ago

just to be clear, i have had every one of these packages that we have discussed already installed and multiple issues regarding most of them posted on github.

i dont know how numpy got uninstalled, but once i reinstalled and recompiled, which was a treat, i'm back to my original issue:

 _extract_arch_version
    base = arch_string.split("_")[1]
           ~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range

I'll have to look into this more and I appreciate you taking your time.

All other Triton-based implementations (including AOTriton) aren't going to perform better than the CK one for Navi3x, but some of them implement the backward pass. They do save a lot of VRAM, but may not even perform better than the Math implementation (the fallback one in PyTorch) in some cases.

anything to save ram would be great. I get OOMs very often, and i don't even max out the CPU. Something snit right on the training, but now that you mention it, i probably was nerfing my self with flash attention.

I probably need to delete my venv and start fresh. I really wish they jut gave us whls. I waste so much time trying to get this card to work and it's still not really capable.

unclemusclez commented 2 months ago

looks good right now i was in the wrong env when i tried to build it...

Import times for custom nodes:
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/websocket_image_save.py
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Image-Selector
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SD3-Powerlab
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SD3LatentSelectRes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-ollama
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-selector
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-HQ-Image-Save
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_3dPoseEditor
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/SD3-Scaling
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/sd-dynamic-thresholding
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Noise
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Thumbnails
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Cutoff
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_ADV_CLIP_emb
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AutoTrimBG
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/stability-ComfyUI-nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SAI_API
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SD3-nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_JPS-Nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-ollama-prompt-encode
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-CogVideoXWrapper
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Flowty-TripoSR
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-layerdiffuse
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Custom-Scripts
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Video-Matting
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-LuminaWrapper
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_UltimateSDUpscale
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/steerable-motion
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_essentials
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-browser
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/RES4LYF
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-DynamiCrafterWrapper
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-GGUF
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-portrait-master
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Flowty-CRM
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-sound-lab
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-dream-video-batches
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-KJNodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui_bmad_nodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-segment-anything-2
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/comfy-image-saver
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-0246
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AnimateAnyone-Evolved
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-CCSR
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/rgthree-comfy
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-TiledDiffusion
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Keyframed
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-SUPIR
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Comfyroll_CustomNodes
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-PhotoMaker-ZHO
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/facerestore_cf
   0.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AudioReactor
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Impact-Pack
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/SeargeSDXL
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-reactor-node
   0.1 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-Manager
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/StableZero123-comfyui
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-PixelArt-Detector
   0.2 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Jags_Audiotools
   0.3 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_InstantID
   0.3 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_StableAudio_Open
   0.7 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Jags_VectorMagic
   0.8 seconds: /home/musclez/ComfyUI/custom_nodes/anynode
   0.8 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-DeepFuze
   1.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_Primere_Nodes
   1.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-TTools
   1.4 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUi-Ollama-YN
   1.7 seconds: /home/musclez/ComfyUI/custom_nodes/was-node-suite-comfyui
   2.1 seconds: /home/musclez/ComfyUI/custom_nodes/comfyui-mixlab-nodes
   2.5 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-APISR-KJ
   3.4 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-AudioReactive
   3.4 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-StableAudioSampler
   4.6 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI_OpenVoice
   7.0 seconds: /home/musclez/ComfyUI/custom_nodes/ComfyUI-MotionDiff
unclemusclez commented 2 months ago

do you know if it is possible to upgrade to 6.2 ROCm on WSL2? we need https://rocm.docs.amd.com/projects/HIP/en/latest/how-to/cooperative_groups.html for https://github.com/graphdeco-inria/diff-gaussian-rasterization

evshiron commented 2 months ago

@unclemusclez

do you know if it is possible to upgrade to 6.2 ROCm on WSL2?

I guess it's not going to work.