Closed Bear-Beer closed 5 months ago
Did you try?
export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.1 export HIP_VISIBLE_DEVICES=0 export ROCM_PATH=/opt/rocm
If not, try each one separate and some combination of them.
Also, what CPU and motherboard are you using?
Thank you for your help @briansp2020 I am a newbie for ROCm and Pytorch, I was wondering if there are some other approaches to do the verification about HIP?
The environment variables do not work for me. there is no difference in the result.
I updated the other information in the description.
This may or may not help. But does rocminfo and rocm-smi show any information about the gpu? If you are a beginner, it's better to stick to a supported OS. According to the documentation (here), Ubuntu, RHEL & SLES are supported. So, you might want to try one of them unless you have to use Centos Stream. If you have to use Centos Stream, you might want to ask for help from Centos Stream forums.
Yes. rocminfo and rocm-smi are all work for me.
[root@6700xt ~]# rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD EPYC 7453 28-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7453 28-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2750
BDFID: 0
Internal Node ID: 0
Compute Unit: 56
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 263265012(0xfb11af4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 263265012(0xfb11af4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 263265012(0xfb11af4) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: AMD EPYC 7453 28-Core Processor
Uuid: CPU-XX
Marketing Name: AMD EPYC 7453 28-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 1
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2750
BDFID: 0
Internal Node ID: 1
Compute Unit: 56
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 264193024(0xfbf4400) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 264193024(0xfbf4400) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 264193024(0xfbf4400) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 3
*******
Name: gfx1031
Uuid: GPU-XX
Marketing Name: AMD Radeon RX 6700 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 3072(0xc00) KB
L3: 98304(0x18000) KB
Chip ID: 29663(0x73df)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2855
BDFID: 33536
Internal Node ID: 2
Compute Unit: 40
SIMDs per CU: 2
Shader Engines: 2
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 115
SDMA engine uCode:: 80
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS:
Size: 12566528(0xbfc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1031
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
[root@6700xt ~]# rocm-smi
========================= ROCm System Management Interface =========================
=================================== Concise Info ===================================
GPU Temp (DieEdge) AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 36.0c 29.0W 2650Mhz 96Mhz 0% auto 186.0W 14% 99%
====================================================================================
=============================== End of ROCm SMI Log ================================
[root@6700xt ~]#
I found that the below codes works on my env:
from transformers import *
pipe = pipeline("translation", model="/home/AiModels/Helsinki-NLP/opus-mt-en-zh/")
pipe("I am Bear, Bear likes Beer")
But The codes failed with the same signature with GPU device
from transformers import *
pipe = pipeline("translation", model="/home/AiModels/Helsinki-NLP/opus-mt-en-zh/", device=0)
pipe("I am Bear, Bear likes Beer")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/home/SourceCode/vscode/examples/01-pipeline/pipeline.ipynb Cell 4 line 3
1 from transformers import *
2 pipe = pipeline("translation", model="/home/AiModels/Helsinki-NLP/opus-mt-en-zh/", device=0)
----> 3 pipe("I am Bear, Bear likes Beer")
File /usr/local/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py:371, in TranslationPipeline.__call__(self, *args, **kwargs)
341 def __call__(self, *args, **kwargs):
342 r"""
343 Translate the text(s) given as inputs.
344
(...)
369 token ids of the translation.
370 """
--> 371 return super().__call__(*args, **kwargs)
File /usr/local/lib/python3.9/site-packages/transformers/pipelines/text2text_generation.py:167, in Text2TextGenerationPipeline.__call__(self, *args, **kwargs)
138 def __call__(self, *args, **kwargs):
139 r"""
140 Generate the output text(s) using text(s) given as inputs.
141
(...)
164 ids of the generated text.
165 """
...
RuntimeError: HIP error: invalid device function
HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing HIP_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_HIP_DSA` to enable device-side assertions.
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
I am running into the same "HIP error: invalid device function" while trying to train a model. This ROCm test script says that everything should be working. Also environment variables are set accordingly. Any ideas?
system:ubuntu22.04 python:3.11.5 rocm:5.7.1 torch:2.2.0+rocm5.7-cp11-cp11 version10.25 torchvision: version10.25
I download .whl from pytorch official website,and create a conda environment. because of the dependence error,I install torch.whl without depends,and install needed dependence seperatly.
torch.cuda.is_available()=Ture
but when I create tensor,it crash.
error is same as you:
RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing HIP_LAUNCH_BLOCKING=1. Compile with TORCH_USE_HIP_DSA
to enable device-side assertions.
Some issue with Arch Linux, rocm5.7.1 and 5.7.0 (from AUR = Ubuntu package), PyTorch 2.0.1, python 3.11.5, RX 6700S.
I can reproduce with torch.ones(2).to(torch.device(0))
or torch.ones(2).to(torch.device(1))
edit: it works with HSA_OVERRIDE_GFX_VERSION=10.3.0 python
can you set env var AMD_LOG_LEVEL=3 and run the test. This will trace HIP API's and may provide more information about what causes the failure.
can you set env var AMD_LOG_LEVEL=3 and run the test. This will trace HIP API's and may provide more information about what causes the failure.
Yes.
[trougnouf@l ~]$ echo "import torch; print(torch.ones(2).cuda())" > tmp/atest.py
[trougnouf@l ~]$ AMD_LOG_LEVEL=3 python tmp/atest.py &> tmp/hsa_default.log
: https://gist.github.com/trougnouf/d952aacf76f1630a95c4d5427abffbc4
[trougnouf@l ~]$ AMD_LOG_LEVEL=3 HSA_OVERRIDE_GFX_VERSION=10.3.0 python tmp/atest.py &> tmp/hsa_override.log
: https://gist.github.com/trougnouf/86313c04cecad89b5cdb8ee67f1a7973
It seems your devices are gfx1032 and gfx1035 but PyTorch is not compiled for them. When you use HSA_OVERRIDE_GFX_VERSION=10.3.0 you tell ROCm to report the devices as gfx1030, which is what PyTorch compiled for.
Setting HSA_OVERRIDE_GFX_VERSION=11.0.0
fixed this issue for me when running on gfx1100. I also have integrated graphics (reported as gfx1036 in rocminfo) which might be the reason for this bug to trigger.
UPD: it fixed HIP error: invalid device function
and made some of the pytorch examples run successfully but others are now hanging indefinitely until a hard reboot (killing a process leaves GPU with 100% load).
Did you try?
export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.1 export HIP_VISIBLE_DEVICES=0 export ROCM_PATH=/opt/rocm
If not, try each one separate and some combination of them.
Thanks this worked for me, I modified the following: export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.0 My configuration is: 6750GRE 12G
I solved this problem by set the environment in virtual env activate file. For it always excludes system bashrc
Greetings, dear ones! I've been struggling with this error for a week now. Please help. Don't scold me too much, I'm just starting to get acquainted with Linux. When launching Stable Diffusion - Error text:
Traceback (most recent call last):
File "/home/sd/miniconda3/envs/amd/lib/python3.10/threading.py", line 973, in _bootstrap
self._bootstrap_inner()
File "/home/sd/miniconda3/envs/amd/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/sd/miniconda3/envs/amd/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/home/sd/stable-diffusion-webui/modules/initialize.py", line 147, in load_model
shared.sd_model # noqa: B018
File "/home/sd/stable-diffusion-webui/modules/shared_items.py", line 128, in sd_model
return modules.sd_models.model_data.get_sd_model()
File "/home/sd/stable-diffusion-webui/modules/sd_models.py", line 531, in get_sd_model
load_model()
File "/home/sd/stable-diffusion-webui/modules/sd_models.py", line 658, in load_model
load_model_weights(sd_model, checkpoint_info, state_dict, timer)
File "/home/sd/stable-diffusion-webui/modules/sd_models.py", line 385, in load_model_weights
model.float()
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/lightning_fabric/utilities/device_dtype_mixin.py", line 88, in float
return super().float()
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 992, in float
return self._apply(lambda t: t.float() if t.is_floating_point() else t)
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/home/sd/miniconda3/envs/amd/lib/python3.10/site-packages/torch/nn/modules/module.py", line 992, in TORCH_USE_HIP_DSA
to enable device-side assertions.
#################################################
Now I’ll show you what I have installed: 1.Operating System: x86_64 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS" PRETTY_NAME="Ubuntu 22.04.3 LTS"
Core: Linux 6.2.0-26-generic #26~22.04.1-Ubuntu
File settings "~/.bashrc" export HIP_LAUNCH_BLOCKING=1 export HIP_VISIBLE_DEVICES=0 export PYTORCH_HIP_ALLOC_CONF=garbage_collection_threshold:0.8,max_split_size_mb:512
Launch through the minikonda virtual environment: HSA_OVERRIDE_GFX_VERSION=10.3.0 python3 launch.py --listen --enable-insecure-extension-access --opt-sdp-attention --no-half-vae
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
Agent 1
Name: AMD A8-5600K APU with Radeon(tm) HD Graphics
Uuid: CPU-XX
Marketing Name: AMD A8-5600K APU with Radeon(tm) HD Graphics
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 16384(0x4000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3600
BDFID: 0
Internal Node ID: 0
Compute Unit: 4
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 32819336(0x1f4c888) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 32819336(0x1f4c888) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 32819336(0x1f4c888) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
Agent 2
Name: gfx1030
Uuid: GPU-3f336d78b9266872
Marketing Name: AMD Radeon RX 6800 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
L3: 131072(0x20000) KB
Chip ID: 29631(0x73bf)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2575
BDFID: 768
Internal Node ID: 1
Compute Unit: 72
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 116
SDMA engine uCode:: 83
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
Done
@Admyfast, I believe that the ROCm minimum system requirements include PCIe 3.0 or greater. I'm not sure if that's the cause of your problem, but that's one thing that stood out to me when looking at your logs.
You may also want to set ROCR_VISIBLE_DEVICES
to hide the integrated graphics from the ROCm stack.
I tried rocm5.7 at 2023.10, and gave up because I counld't solve HIP error: invalid device function.Recently, I installed ubuntu22.04.04 and rocm6.0.2.I can run stable diffusion webui successfully on my 6700xt. However, when I excuted pytorch code, I still met HIP error: invalid device function.
P.S.I have added code in terminal as follows export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.1 export HIP_VISIBLE_DEVICES=0 export ROCM_PATH=/opt/rocm
God,who can help me!!! I don't hope switching to nvidia graphic card is my final solution.
God,who can help me!!! I don't hope switching to nvidia graphic card is my final solution.
AMD hyped up all the AI base about ROCm being a serious solution and competitor to CUDA, the reality is, it's far from the truth. I go from one issue to another, each step is a minor victory but then you hit another blockade, which is frustrating. Truth of the matter, ROCm is not production ready.
I solved this problem by set the environment in virtual env activate file. For it always excludes system bashrc
This solved the issue for me. For anyone using Pycharm, try adding the environment variables to Pycharm. Here's a link explaining how to do this: https://stackoverflow.com/questions/42708389/how-to-set-environment-variables-in-pycharm
Just need to put
from os import putenv
putenv("HSA_OVERRIDE_GFX_VERSION", "10.3.0")
on the top of your codes.
It's dirty, but it's working.
The reason might be your interpreter runtime didn't use your system variables.
Force to inject HSA_OVERRIDE_GFX_VERSION
on the runtime, it will work.
The reason might be your interpreter runtime didn't use your system variables.\n\nForce to inject HSA_OVERRIDE_GFX_VERSION on the runtime, it will work.
I've changed my graphics card, but the method you proposed I haven't used before. I used to add environment variables to .bashrc. Are you using the 6700xt as well? If anyone succeeds, can you tell me, thank you!
I've changed my graphics card, but the method you proposed I haven't used before. I used to add environment variables to .bashrc. Are you using the 6700xt as well? If anyone succeeds, can you tell me, thank you!
Yes. I bought several rx6700xt cards used for running deep learning training. The method is working on the cards.
.bashrc
might be not working because not all applications use bash
as its shell, and some even ignore the variables of global environment just like Jupyter on VScode
.
You can find config for the application to set the variable, or just to put the variable on your codes. Stupid but it's working.
God,who can help me!!! I don't hope switching to nvidia graphic card is my final solution.
AMD hyped up all the AI base about ROCm being a serious solution and competitor to CUDA, the reality is, it's far from the truth. I go from one issue to another, each step is a minor victory but then you hit another blockade, which is frustrating. Truth of the matter, ROCm is not production ready.
I think it's less about ROCm and more about everything being made to work on CUDA and now there are layers to make it work with ROCm, somehow.
The shim for running CUDA workloads on ROCm is available at vosen/ZLUDA.
The shim for running CUDA workloads on ROCm is available at vosen/ZLUDA.
Last time I tried it did not work at all. IIRC whatever I was trying to do was still checking for NVIDIA card availability and thus refusing to work. This stuff should be more vendor agnostic and not rely on proprietary commercial solutions.
I was trying to do was still checking for NVIDIA card availability and thus refusing to work.
What's the application you tried to use with ZLUDA?
**RuntimeError: HIP error: invalid device function
I had the same error when doing anything other than creating integers and sending them to the GPU on my 6700 XT.
The solution was just to restart the PC, not sure why (both drivers and ROCm were already installed, only PyTorch was new). A thing worth trying if you get the error.
I still need to define global variables:
export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.1 export HIP_VISIBLE_DEVICES=0 export ROCM_PATH=/opt/rocm
The solution was just to restart the PC, not sure why (both drivers and ROCm were already installed, only PyTorch was new). A thing worth trying if you get the error.
The problem might because HSA_OVERRIDE_GFX_VERSION=10.3.0
wasn't defined before loading HIP Runtime.
So the library couldn't recognize your rx6700xt ISA is the same as gfx1030
.
The problem might because
HSA_OVERRIDE_GFX_VERSION=10.3.0
wasn't defined before loading HIP Runtime. So the library couldn't recognize your rx6700xt ISA is the same asgfx1030
.
I manually defined the variable before calling any PyTorch stuff. Shouldn't it already be defined before loading HIP runtime? In any case, I'm adding manually-defined global variables to previous answer.
In fact, some interpreter might ignore the system environment variables,
so it's required to make sure the HSA_OVERRIDE_GFX_VERSION=10.3.0
is defined.
For example, while using pytorch
with ipython
, vscode's jupyter
or jupyter notebook
:
from os import putenv
putenv("HSA_OVERRIDE_GFX_VERSION", "10.3.0") # The line must be defined before importing torch.
import torch # OK. The HIP Runtime of PyTorch can recognize your ISA.
import torch.nn as nn
Yeah, that's exactly what I did in my Jupyter Lab
environment (restarting kernel, etc.), so I guess it is still unclear why the issue solved itself by restarting the PC after installing PyTorch.
As a note, I also tried running pure Python, adding the variables via export, with the same result.
Thank you all. This is a very helpful thread. I also managed to run my script on my AMD Radeon RX 7700S on EndeavourOS (Arch) by running:
HSA_OVERRIDE_GFX_VERSION=11.0.0 python script.py
My rocminfo and script are attached. Feel free to use my script to test performance difference between CPU and GPU with torch. rocm_cuda_test.zip
@supersonictw's fix worked for me:
At last! Training with my Ryzen 5 5600X took about 1 hour. My 6700XT brought this down to just under 2 minutes. Great stuff.
@Bear-Beer Please confirm if we can close this ticket. Thanks!
@supersonictw Thank you very much for the solution! It works for my 6700xt.
P.S.I have added code in terminal as follows export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.1 export HIP_VISIBLE_DEVICES=0 export ROCM_PATH=/opt/rocm
@plane714 @dangarciahe
Oops! I notice you're trying to define HSA_OVERRIDE_GFX_VERSION=10.3.1
.
But actually, the variable must to be HSA_OVERRIDE_GFX_VERSION=10.3.0
.
Anyway, hoping you get well, with any computing device.
Did you try?
export PYTORCH_ROCM_ARCH="gfx1031" export HSA_OVERRIDE_GFX_VERSION=10.3.1 export HIP_VISIBLE_DEVICES=0 export ROCM_PATH=/opt/rocm
If not, try each one separate and some combination of them.
This bricked my install
This bricked my install
Please use HSA_OVERRIDE_GFX_VERSION=10.3.0
instead.
只需要放
from os import putenv putenv("HSA_OVERRIDE_GFX_VERSION", "10.3.0")
在你的程式碼的頂部。
它很髒,但它正在工作。
The reason might be your interpreter runtime didn't use your system variables.
Force to inject
HSA_OVERRIDE_GFX_VERSION
on the runtime, it will work.
I success in Arch Linux
use
export HSA_OVERRIDE_GFX_VERSION=10.3.0
/home/<username>/StabilityMatrix/StabilityMatrix.AppImage
Thank you
I success in
Arch Linux
use
export HSA_OVERRIDE_GFX_VERSION=10.3.0
/home/<username>/StabilityMatrix/StabilityMatrix.AppImage
Thank you
恭喜 不客氣 🎊 You're welcome
I had to use HSA_OVERRIDE_GFX_VERSION=10.3.0
on my RX6600 XT (Arch) too. With both the pre-built 2.5.1 + RoCM 6.2
wheel from pip, as well as 2.6.0
built from source (git main).
This is despite my GPU being detected as gfx1032
. And when building from source, I also had to set PYTORCH_ROCM_ARCH="gfx1030"
, otherwise I'd get the same error as OP.
Funnily enough, it did work for integer tensors without the above fix. When I tried anything floating-point though, it would crash with an error like here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/11942
Server: (Inspur NF5280A6) + (2 x Milan7453) + (16Dimm * 32GB-3200) GPU: 83:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT] (rev c1) OS: Centos Steam9.2, kernel:5.14.0-373.el9.x86_64 ROCm:5.7 Pytorch installation cmd:pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.7
[root@6700xt ~]# dkms status amdgpu/6.2.4-1652687.el9, 5.14.0-373.el9.x86_64, x86_64: installed (original_module exists) [root@6700xt ~]#
ipython interaction