Closed mergmann closed 1 year ago
I have not encountered this situation before. dmesg
should say something about the weirdness.
On my end, I only tested on Ubuntu 22.04, and installing the AMDGPU driver on Ubuntu is simple:
amdgpu-install
package
sudo amdgpu-install --usecase=graphics,rocm
video
and render
groupsAfter restarting, information about the GPU should be visible in rocminfo
. At this point, running the docker run
command here should be able to run the relevant applications.
The above are all my experiences. If you are interested, you can install an Ubuntu and try to follow the above steps to see if it works.
Afaik, the official images/wheels don't have gfx1100 support, and should fail quickly. If the GPU becomes unrecoverable, a reboot will help.
I'm using EndeavourOS (Arch), I might try ubuntu on a usb drive if I still won't get it to work. Oh well, even Blender hangs when using GPU compute.
I just realized that I'm having rocm 5.4.3 installed, 5.5 for arch is in development. That means I have to wait a few days until it is released. I hope it'll work then. I thought I had the latest version of the runtime installed, so it's my mistake.
Hmmm. I can find things like https://archlinux.org/packages/extra-staging/x86_64/rocm-ml-sdk/, but I don't know if it can be installed or not. Anyway, wish you good luck!
Nah, installing from a staging repo is not a good idea.
tried with rocm-ml-sdk on artix still bugged
I somehow got the full backtrace from gdb, it also knows the functions where it got stuck. Probably not helping much, but it is interesting to find the bug. I guess it is a busy loop? I might try looking into the code. It could be this one: https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/6fdf759273a098829dfd642fb730ea410f33b152/src/core/runtime/interrupt_signal.cpp#L139
#0 0x00007f731904d5cf in rocr::core::InterruptSignal::WaitRelaxed(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#1 0x00007f731904d48a in rocr::core::InterruptSignal::WaitAcquire(hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#2 0x00007f7319041979 in rocr::HSA::hsa_signal_wait_scacquire(hsa_signal_s, hsa_signal_condition_t, long, unsigned long, hsa_wait_state_t) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#3 0x00007f731901dc70 in rocr::AMD::BlitKernel::SubmitLinearCopyCommand(void*, void const*, unsigned long) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#4 0x00007f7319036525 in rocr::(anonymous namespace)::RegionMemory::Freeze() () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#5 0x00007f731906eb44 in rocr::amd::hsa::loader::Segment::Freeze() [clone .part.29] () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#6 0x00007f731906ebbf in rocr::amd::hsa::loader::ExecutableImpl::Freeze(char const*) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#7 0x00007f731906e2a8 in rocr::amd::hsa::loader::AmdHsaCodeLoader::FreezeExecutable(rocr::amd::hsa::loader::Executable*, char const*) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#8 0x00007f7319045de7 in rocr::HSA::hsa_executable_freeze(hsa_executable_s, char const*) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libhsa-runtime64.so
#9 0x00007f733ae07ccf in roctracer::hsa_support::(anonymous namespace)::ExecutableFreezeIntercept(hsa_executable_s, char const*) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libroctracer64.so
#10 0x00007f733ae108fc in roctracer::hsa_support::detail::hsa_executable_freeze_callback(hsa_executable_s, char const*) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libroctracer64.so
#11 0x00007f73654c3f9f in roc::LightningProgram::setKernels(void*, unsigned long, int, unsigned long, std::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#12 0x00007f7365481a66 in device::Program::loadLC() () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#13 0x00007f7365481b1f in device::Program::load() () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#14 0x00007f73654ac334 in amd::Program::load(std::vector<amd::Device*, std::allocator<amd::Device*> > const&) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#15 0x00007f736547effc in amd::Device::BlitProgram::create(amd::Device*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#16 0x00007f73654baff1 in roc::Device::createBlitProgram() () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#17 0x00007f73654fe260 in roc::KernelBlitManager::createProgram(roc::Device&) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#18 0x00007f73654d03fd in roc::VirtualGPU::create() () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#19 0x00007f73654b6353 in roc::Device::createVirtualDevice(amd::CommandQueue*) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#20 0x00007f73654a54d0 in amd::HostQueue::HostQueue(amd::Context&, amd::Device&, unsigned long, unsigned int, amd::CommandQueue::Priority, std::vector<unsigned int, std::allocator<unsigned int> > const&) ()
from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#21 0x00007f736540028e in hip::Stream::Create() () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#22 0x00007f7365400580 in hip::Stream::asHostQueue(bool) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#23 0x00007f736529ae2e in hip::Device::NullStream(bool) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#24 0x00007f736537e9cd in hipMemcpyWithStream () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libamdhip64.so
#25 0x00007f73672d24f8 in at::native::copy_kernel_cuda(at::TensorIterator&, bool) () from /home/mattisb/Programming/AI/deepfloyd-if/.venv/lib/python3.10/site-packages/torch/lib/libtorch_hip.so
Weird. I tried running Stable Diffusion with the latest ROCm 5.6 on ubuntu and endeavouros, now it doesn't hang anymore, but AUTOMATIC1111 errors with "A tensor with all NaNs was produced in VAE." or "A tensor with all NaNs was produced in Unet."
I tried using "--no-half" and "--precision full", but that didn't help. I debugged it a bit further with ComfyUI and found out that all used models tend to return values like -e+38, e+38, -inf, inf, nan. Those values propagate through to other networks. For example CLIP might return e+38, then ksampler likely returns inf or nan. If CLIP returns "normal" values, ksampler returns e+38 or inf, so the the VAE produces nans. I don't know why that happens both on arch and ubuntu 22.04? My GPU is connected to a PCIe 4 slot that is also set to PCIe 4 mode.
When running it on the cpu (e.g. with --cpu in ComfyUI, everything is fine)
Example:
Code: print('Values after CLIP', np.unique(cond, return_counts=True))
Output: Values after CLIP (array([nan], dtype=float32), array([59136]))
@MattisBergmann
Weird. Does torch
works for simpler code?
Btw, did you try setting HSA_OVERRIDE_GFX_VERSION=11.0.0
and HIP_VISIBLE_DEVICES=0
for your Navi 31 GPU before launching WebUI? This should be set for every application to ensure best compatibility.
The number of HIP_VISIBLE_DEVICES=0
may vary. You can find the order via rocminfo
and it starts from 0
.
Weird. Does torch works for simpler code?
Yes, running simple operations like adding, multipling, etc. on tensors works.
Btw, did you try setting HSA_OVERRIDE_GFX_VERSION=11.0.0 and HIP_VISIBLE_DEVICES=0 for your Navi 31 GPU before launching WebUI? This should be set for every application to ensure best compatibility.
I tried it with and without those options, nothing changed.
I'm on vacation rn, so I hope when I'm back the drivers will be on the stable and pytorch will have 5.5 on stable as well.
I am closing this issue. If you still have issue running Stable Diffusion, you can try:
git clone https://github.com/vladmandic/automatic
cd automatic
./webui.sh --debug
Which should now provide a smooth out-of-box experience.
did something change to make it work?
Arch Linux should now have modern ROCm packages. I am not sure what doesn't work.
i forgot abt my own duplicate issue where i fixed it by switching to ubuntu lol my bad
Even with rocm5.6 it won't work, I just get RuntimeError: HIP error: the operation cannot be performed in the present state
on arch, on ubuntu, it fails to install rocm, I might try to reinstall ubuntu and try it again. It is just very annoying to work with 2 bootloaders :/ (I use EndeavourOS with systemd-boot and ubuntu with GRUB). I guess the problem lies somewhere else, neither in pytorch, nor in rocm.
@MattisBergmann
video
and render
groups?rm -r venv sdnext.log && ./webui.sh --debug
, and then post sdnext.log
here?Is your current user in both
video
andrender
groups?
yes, but I added that after installing rocm
Can you run
rm -r venv sdnext.log && ./webui.sh --debug
, and then postsdnext.log
here?
I ran that command both with and without
TORCH_COMMAND="--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6"
With rocm5.4.2, it segfaults
@MattisBergmann
You don't need to specify TORCH_COMMAND="--pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm5.6"
.
./webui.sh
will now take care of all of it.
Could you unset that and try again?
I git pull
ed the latest version and it installed rocm 5.4.2.
Could you unset that and try again?
I already did that. That is the log "sdnext_rocm542.log"
@MattisBergmann
Ouch. I am sorry. Can you post the outputs of rocm_agent_enumerator
and rocminfo
?
And hipconfig --version
as well.
If you are using Arch Linux variants, installing rocm-ml-sdk
should have you covered.
@MattisBergmann
Can you run rm -r venv sdnext.log && ./webui.sh --debug
again?
I am checking the very beginning of the logs, which says "AMD ROCm toolkit detected", and there will be some info about what devices are detected and which is used, which depends on the commands in the previous reply.
with rocm-ml-sdk it is still not working, I'm getting the same issue.
$ ./webui.sh --debug
Create and activate python venv
Launching launch.py...
02:11:57-693728 INFO Starting SD.Next
02:11:57-695474 INFO Python 3.10.12 on Linux
02:11:57-697494 INFO Version: 22dc42fd Tue Aug 15 01:07:00 2023 +0800
02:11:57-927786 INFO Latest published version: fce48be440b888ce4ceb27f4d081454d6cc8fd2b 2023-08-14T07:58:42Z
02:11:57-928447 DEBUG Setting environment tuning
02:11:57-928928 DEBUG Torch overrides: cuda=False rocm=False ipex=False diml=False
02:11:57-929432 DEBUG Torch allowed: cuda=True rocm=True ipex=True diml=True
02:11:57-929996 INFO AMD ROCm toolkit detected
02:11:57-945125 DEBUG ROCm agents detected: ['gfx1100', 'gfx1036']
02:11:57-945751 DEBUG ROCm agent used by default: idx=0 gpu=gfx1100 arch=navi3x
02:11:57-971205 DEBUG ROCm version detected: 5.6
What's your output after installing rocm-ml-sdk
and all of those commands are available?
after unsetting TORCH_COMMAND, it shows that. sdnext.log
@MattisBergmann
It doesn't seem to be unset actually, because if TORCH_COMMAND
is set, "AMD ROCm toolkit detected" will not be printed in the log, according to https://github.com/vladmandic/automatic/blob/master/installer.py#L309, which is what your log indicating.
Would you mind starting a new terminal window, or double check if it's really unset with the export
command, and try again?
Oh I guess the problem was that I forgot to remove the log so it contained all the previous runs. Here is the new log: sdnext.log
I'll make a fresh install of ubuntu tomorrow and I'll try to get it to run there.
@MattisBergmann
I lose.
One last check, did dmesg
say anything about the failure?
No, the newest entries from dmesg are from the last boot
@MattisBergmann
I am curious do /dev/kfd
and /dev/dri
exist?
If you are going to try it on a fresh Ubuntu, would you mind doing this:
# install dependencies
sudo apt update && sudo apt install -y git python3-pip python3-venv python3-dev libstdc++-12-dev
# install the amdgpu driver with rocm support
curl -O https://repo.radeon.com/amdgpu-install/5.6/ubuntu/jammy/amdgpu-install_5.6.50600-1_all.deb
sudo dpkg -i amdgpu-install_5.6.50600-1_all.deb
# opencl might cause issues later, so skip it unless you need it
sudo amdgpu-install --usecase=graphics,rocm
# grant current user the access to gpu devices
sudo usermod -aG video $USER
sudo usermod -aG render $USER
# reboot is needed to make both driver and user group take effect
sudo reboot
git clone https://github.com/vladmandic/automatic
cd automatic
./webui.sh --debug
Sources:
Sorry for taking your so much time here.
ls -l /dev/kfd /dev/dri
crw-rw-rw- 1 root render 234, 0 14. Aug 20:00 /dev/kfd
/dev/dri:
total 0
drwxr-xr-x 2 root root 80 14. Aug 20:00 by-path
crw-rw----+ 1 root video 226, 1 14. Aug 20:00 card1
crw-rw-rw- 1 root render 226, 128 14. Aug 20:00 renderD128
Sorry for taking your so much time here.
No problem, I also want to have it fixed, thanks for your help
Even on a fresh ubuntu install, it is still the same error. However when installing rocm with amdgpu-install, It showed some warnings W: Possible missing firmware /lib/firmware/amdgpu/<file>.bin for module amdgpu
I don't have the exact warnings, but it looked similar to
https://askubuntu.com/questions/1124253/missing-firmware-for-amdgpu
I downloaded the newest firmware from
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git and reinstalled amdgpu, yet it didn't resolve all of those warnings. After rebooting, I still get RuntimeError: HIP error: the operation cannot be performed in the present state
I hope, that I don't have a faulty card or mainboard. What I could try is putting the gpu in the PCIe 3.0 slot, but that is some work as it doesn't fit very well with the mainboard and case.
@MattisBergmann
Those warnings should be safe to ignore I guess. But it's weird that RuntimeError: HIP error: the operation cannot be performed in the present state
still happens on Ubuntu.
What's the vendor of your RX 7900 XT?
The card is an XFX SPEEDSTER MERC 310 AMD Radeon™ RX 7900 XT (So the vendor is XFX)
@MattisBergmann
The following code snippet is from:
Would you mind trying this and see from which line it starts to fail?
$ export HIP_VISIBLE_DEVICES=0
$ export HSA_OVERRIDE_GFX_VERSION=11.0.0
$ source venv/bin/activate
$ python3.10
import torch
device='cuda' # None?
rnd = torch.sum(torch.randn(2, 2)).to(device)
print(rnd)
x = torch.tensor([[1.5,.0,.0,.0]]).to(device).half()
layerNorm = torch.nn.LayerNorm(4, eps=0.00001, elementwise_affine=True, dtype=torch.float16, device=device)
y = layerNorm(x)
print(y)
I made a fresh Arch Linux installation just now, by following the Arch Wiki:
pacman -Syu mesa xf86-video-amdgpu vulkan-radeon libva-mesa-driver mesa-vdpau
pacman -Syu plasma-meta konsole
pacman -Syu rocm-ml-sdk
pacman -Syu git base-devel
useradd -m -G video,render -s /bin/bash USERNAME
passwd USERNAME
systemd-networkd
, enable SDDM servicegit clone https://aur.archlinux.org/python310.git
cd python310 && makepkg --skippgpcheck && sudo pacman -U python310-3.10.12-1-x86_64.pkg.tar.zst
export ROCM_HOME=/opt/rocm
export PATH="$ROCM_HOME/bin:$PATH"
git clone https://github.com/are-we-gfx1100-yet/automatic
python3
to python3.10
in webui.sh
./webui.sh
On my end it works just fine.
I can't believe it's a hardware issue, but I don't have other ideas now.
I have found out that I can enable the log with AMD_LOG=<log level>
Setting it to 1 (error), reveals the error:
:1:rocvirtual.cpp :2902: 4030434843 us: 14659: [tid:0x7f3dd48fe6c0] Pcie atomics not enabled, hostcall not supported
:1:rocvirtual.cpp :3235: 4030434846 us: 14659: [tid:0x7f3dd48fe6c0] AQL dispatch failed!
I have actually never heard of PCIe atomics before, but it seems that ROCm requires them. And I can't find much information about which mainboards and CPU actually support them. I gues my mainboard/cpu doesn't. Mainboard: ASUS PRIME B560 PLUS CPU: Intel i5-11400f
I have actually never heard of PCIe atomics before, but it seems that ROCm requires them.
This is true. Glad you found them. Your hardware doesn't look outdated.
According this, PCIe Atomics is introduced in PCI-E 3.0.
Maybe some BIOS tweakings?
See also:
@MattisBergmann @evshiron I have encountered the same issue (but I'm not trying to run stable diffusion or anything in this example, just basic pytorch) I'll share my setup:
When running with AMD_LOG_LEVEL=3
and HIP_VISIBLE_DEVICES=0
, I get the following as part of the verbose stack trace/debug output:
rocvirtual.cpp :2902: 1256580953 us: 19667: [tid:0x7f1b8eb82740] Pcie atomics not enabled, hostcall not supported
:1:rocvirtual.cpp :3235: 1256580956 us: 19667: [tid:0x7f1b8eb82740] AQL dispatch failed!
:3:hip_module.cpp :663 : 1256580959 us: 19667: [tid:0x7f1b8eb82740] hipLaunchKernel: Returned hipErrorIllegalState :
:3:hip_error.cpp :27 : 1256580963 us: 19667: [tid:0x7f1b8eb82740] hipGetLastError ( )
:3:hip_error.cpp :27 : 1256580966 us: 19667: [tid:0x7f1b8eb82740] hipGetLastError ( )
:3:hip_device_runtime.cpp :561 : 1256583779 us: 19667: [tid:0x7f1b8eb82740] hipSetDevice ( 0 )
:3:hip_device_runtime.cpp :565 : 1256583783 us: 19667: [tid:0x7f1b8eb82740] hipSetDevice: Returned hipSuccess :
However, setting HIP_VISIBLE_DEVICES=1
(to the first PCIe 16 card) works fine, so the motherboard is maybe only enabling atomics for one of the two PCIe x16 slots?
Seems odd. I am out of my depth here. I contacted AMD support last week but haven't heard back from them since it got escalated to a supervisor. They were helpful throughout our conversation - the impression I got was that they don't really get many questions about rocm so it was something they needed to escalate in order to get any answers on.
I'll see if I can explore my options for PCIe atomics as well, this is the first thread I've come across that mentions this since I started trying to work with these 7900 XTX cards. If I find anything, I'll follow up.
If anyone is more knowledgeable than I am on the feasibility of this kind of consumer-grade dual GPU setup, I would love to hear suggestions.
P.S. @evshiron thanks for your blog posts about Are We GFX1100 Yet?
- they have been very helpful in debugging some issues with pytorch.
@codinglife9531
Thanks for reaching out! I am glad it has been helpful.
According to these links:
It seems that only the PCI-E lanes from the CPU support PCIe Atomics, which might be why your configuration is not working.
High-end customer motherboards like ROG STRIX X670E-E GAMING WIFI (see "Expansion Slots", not a recommendatoin) allow splitting and running in x8/x8 mode when two slots are used. As both slots come out from the CPU, I guess both of them support PCIe Atomics.
I am not sure if we can split from x16 to x8/x8 via an external PCI-E splitter while preserving support for PCIe Atomics.
@evshiron Thanks for your response. I'll dig into it further.
Still learning here, but in your experience/opinion, do you think that splitting into x8/x8 (edit: not via an external splitter btw) will impact something like LLM inferencing substantially? I'd assume that x16 would be preferable.
@codinglife9531
My RX 7900 XTX used to work on a B450M motherboard, which is PCI-E 3.0 x16. Now it's running on PCI-E 4.0 x16. After the upgrade, I noticed that the inference performance of GPTQ doubled, so I believe that bandwidth does have a significant impact in LLM scenarios, but I didn't observe a significant difference in Stable Diffusion scenarios.
In my country, there are online platforms that sell outdated used server motherboards and CPUs. They are much cheaper compared to brand new ones, making them perfect for tinkering. Server configurations usually support a much larger number of PCI-E slots, so perhaps you can consider exploring in that direction.
I already have it in the slot that comes from the cpu, it is in the pcie 4.0 slot. It might have to do with the NVMe SSD, but the data sheet states that my cpu has 20 lanes available from which 16 go to the gpu and 4 to the SSD.
@MattisBergmann
Yes. The situation you are currently facing is quite weird, but I have exhausted all my ideas now.
Some other info:
lspci -vvv
info for my RX 7900 XTX:
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn+
Well, for me it shows AtomicOpsCtl: ReqEn-
. I don't know exactly, but I would interpret it as not supported. Although it is connected to an x16 slot directly from the cpu
LnkSta: Speed 16GT/s, Width x16
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Form Factor Dev Specific, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit+ 64bit+ 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
AtomicOpsCtl: ReqEn-
@MattisBergmann
Maybe the setpci
command here will help you:
But it looks magical and I don't know if rebooting will revert the change.
I first tried using stable diffusion with pytorch/rocm5.4.2. That didn't work (it hangs indefinetely when copying data to VRAM) since my RX7900 XT is not officially supported by ROCm 5.4. Then I tried compiling pytorch with rocm5.5 myself in a docker container. 2h later, I got the same problem. Then i tried the prebuilt wheels from this repo (with automatic and deepfloyd) and the docker containers (a1111 and automatic), still no success. Even a simple script like this hangs:
Output:
dmesg doesn't show anything, rocminfo and rocm-smi aren't helpful either. radeontop doesn't show a difference to normal usage. With htop I can see that torch maxes out a single core. Probably it is stuck in an infinite loop? GDB backtrace:
Is this a common bug? I think, it might actually be a broken amdgpu/rocm installation?