[Issue]: "Attempting to use hipBLASLt on a unsupported architecture!"

nktice commented 1 month ago

Problem Description

I maintain this guide for using some AI tools on AMD cards : http://github.com/nktice/AMD-AI There was an option to stop HipBLASLt from loading : "TORCH_BLAS_PREFER_HIPBLASLT=0" That had been working ( ROCm 6.2.1 / PyTorch 2.4.x ) ... alas this appears to not be functioning with PyTorch 2.6.x.

Both Stable Diffusion and ComfyUI that were previously working are now crashing with the following error - "RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!" What had worked before, is no longer functional as a work around.

This is with torch==2.6.0.dev20241015+rocm6.2 torch==2.6.0.dev20241014+rocm6.2

[ and as default install for "torch" from https://download.pytorch.org/whl/nightly/torch/ at time of writing... ] I have reverted and tried with torch==2.5.0.dev20240912 torchvision==0.20.0.dev20240912+rocm6.2 torch==2.6.0.dev20240913 torchvision==0.20.0.dev20240913+rocm6.2 torch==2.6.0.dev20241011 torchvision==0.20.0.dev20241011+rocm6.2 ... torch==2.6.0.dev20241013 torchvision==0.20.0.dev20241013+rocm6.2 And do not get this error, and things work 'normally'

( Yes, I had to go and download pytorch's files to try each of those... )

Operating System

Ubuntu 24.04.1

CPU

AMD Ryzen 9 5950x

GPU

AMD Radeon RX 7900 XTX

Other

No response

ROCm Version

ROCm 5.7.1

ROCm Component

No response

Steps to Reproduce

Here is a link to the guide that uses nightlies and/or dev versions - https://github.com/nktice/AMD-AI/blob/main/dev.md

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module version 6.8.5 is loaded =====================
HSA System Attributes
=====================
Runtime Version: 1.14 Runtime Ext Version: 1.6 System Timestamp Freq.: 1000.000000MHz Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count) Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED DMAbuf Support: YES

==========
HSA Agents
==========

Agent 1

Name: AMD Ryzen 9 5950X 16-Core Processor Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 5950X 16-Core Processor Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3400
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Memory Properties:
Features: None Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65747224(0x3eb3918) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED Size: 65747224(0x3eb3918) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65747224(0x3eb3918) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:

Agent 2

Name: gfx1100
Uuid: GPU-[redacted]
Marketing Name: Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2431
BDFID: 3072
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 342
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32

Agent 3

Name: gfx1100
Uuid: GPU-[redacted] Marketing Name: Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2431
BDFID: 3840
Internal Node ID: 2
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Memory Properties:
Features: KERNEL_DISPATCH Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 342
SDMA engine uCode:: 21
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension: x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension: x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Done

Additional Information

Here's the full output from ComfyUI

!!! Exception during processing !!! Attempting to use hipBLASLt on a unsupported architecture!
Traceback (most recent call last):
  File "/home/n/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/home/n/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
  File "/home/n/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/home/n/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
  File "/home/n/ComfyUI/nodes.py", line 65, in encode
    output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
  File "/home/n/ComfyUI/comfy/sd.py", line 125, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
  File "/home/n/ComfyUI/comfy/sd1_clip.py", line 589, in encode_token_weights
    out = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
  File "/home/n/ComfyUI/comfy/sd1_clip.py", line 41, in encode_token_weights
    o = self.encode(to_encode)
  File "/home/n/ComfyUI/comfy/sd1_clip.py", line 229, in encode
    return self(tokens)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/sd1_clip.py", line 201, in forward
    outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/clip_model.py", line 136, in forward
    x = self.text_model(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/clip_model.py", line 112, in forward
    x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/clip_model.py", line 69, in forward
    x = l(x, mask, optimized_attention)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/clip_model.py", line 50, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/clip_model.py", line 17, in forward
    q = self.q_proj(x)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/n/ComfyUI/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/ops.py", line 76, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
  File "/home/n/ComfyUI/comfy/ops.py", line 72, in forward_comfy_cast_weights
    return torch.nn.functional.linear(input, weight, bias)
RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!

adamrokickirns commented 1 month ago

Can confirm, same GPU and same issue

bpjoestar commented 1 month ago

I have the same issue, only the last line numbers are different. I installed ROCm in my Ubuntu 24.04 distrobox: amdgpu-install_6.2.60202-1_all.deb Followed the ComfyUI guide with the nighly verion.

OS posix Python Version 3.12.3 (main, Sep 11 2024, 14:17:37) [GCC 13.2.0] Embedded Python false Pytorch Version 2.6.0.dev20241020+rocm6.2 Arguments main.py RAM Total 31.26 GB RAM Free 18.61 GB Devices Name cuda:0 AMD Radeon RX 6950 XT : native Type cuda VRAM Total 15.98 GB VRAM Free 14.24 GB Torch VRAM Total 1.65 GB Torch VRAM Free 124 MB


Traceback (most recent call last):
  File "/home/bp/Homebox/AI/ComfyUI/execution.py", line 323, in execute
    output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/execution.py", line 198, in get_output_data
    return_values = _map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/execution.py", line 169, in _map_node_over_list
    process_inputs(input_dict, i)
  File "/home/bp/Homebox/AI/ComfyUI/execution.py", line 158, in process_inputs
    results.append(getattr(obj, func)(**inputs))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/nodes.py", line 65, in encode
    output = clip.encode_from_tokens(tokens, return_pooled=True, return_dict=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/sd.py", line 125, in encode_from_tokens
    o = self.cond_stage_model.encode_token_weights(tokens)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/sdxl_clip.py", line 60, in encode_token_weights
    g_out, g_pooled = self.clip_g.encode_token_weights(token_weight_pairs_g)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/sd1_clip.py", line 41, in encode_token_weights
    o = self.encode(to_encode)
        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/sd1_clip.py", line 238, in encode
    return self(tokens)
           ^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/sd1_clip.py", line 210, in forward
    outputs = self.transformer(tokens, attention_mask_model, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state, dtype=torch.float32)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/clip_model.py", line 136, in forward
    x = self.text_model(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/clip_model.py", line 112, in forward
    x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/clip_model.py", line 69, in forward
    x = l(x, mask, optimized_attention)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/clip_model.py", line 50, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/clip_model.py", line 17, in forward
    q = self.q_proj(x)
        ^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/ops.py", line 68, in forward
    return self.forward_comfy_cast_weights(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/bp/Homebox/AI/ComfyUI/comfy/ops.py", line 64, in forward_comfy_cast_weights
    return torch.nn.functional.linear(input, weight, bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Attempting to use hipBLASLt on a unsupported architecture!```

ppanchad-amd commented 1 month ago

Hi @nktice. Internal ticket has been created to investigate your issue. Thanks!

tcgu-amd commented 1 month ago

Hi @adamrokickirns, @bpjoestar, I believe this is a known issue is related to https://github.com/pytorch/pytorch/issues/138067. It should be fixed in the latest PR https://github.com/pytorch/pytorch/pull/138267 which was only merged 2 days ago. What happened is that ROCm is supposed to not use hipBlASLt on unsupported hardware, but that behavior was accidentally removed in PR https://github.com/pytorch/pytorch/pull/137604. It should work with 20241021's nightly build.

By the way @nktice seems like you were able to confirm it works in the other thread, so I will be closing this issue. Please feel free to re-open if the issue persists. Thanks!

nktice commented 1 month ago

By the way @nktice seems like you were able to confirm it works in the other thread, so I will be closing this issue. Please feel free to re-open if the issue persists. Thanks!

That nightly did work... but @naromero77amd said that it wasn't right, and issue regressed. It does however look like they're aware of it, and working on it.

tcgu-amd commented 1 month ago

@nktice Thank you for the follow up! I think for the purpose of this particular issue, we can consider the issue is patched for now despite the regression. However, I will link the proper fix here to help keeping track of the progress. https://github.com/pytorch/pytorch/pull/138267

Thanks!

ROCm / hipBLASLt