Open Gracana opened 11 months ago
After a hot start with everything loaded, it's finishing the prompt in 80s. This is fantastic!
Thanks for documenting this. I am seeing the same GPU lockup on ROCm 5.7.1
on a RX 6900
when running the 14-frame SVD example workflow from the documentation:
https://comfyanonymous.github.io/ComfyUI_examples/video/
Are you able to run this with default settings? Do you have any kernel parameters set? What are the reduced settings of SVD that avoided the lockup?
My dmesg:
[ 662.787190] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=45156, emitted seq=45157
[ 662.787695] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 4770 thread gnome-shel:cs0 pid 4814
[ 662.788165] amdgpu 0000:0d:00.0: amdgpu: GPU reset begin!
[ 662.788192] amdgpu: Failed to suspend process 0x800c
[ 662.803165] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 663.173290] amdgpu 0000:0d:00.0: amdgpu: MODE1 reset
kernel: 6.6.8-arch1-1
After reducing some settings I am able to complete the 14-frame workflow sometimes.
But this is not enough, I also need to set the power_dpm_force_performance_level
to high
.
sudo su
echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level
Even then, interacting with the desktop when the workflow runs can also lead to the GPU lockup. Or maybe it's even just luck, sometimes it also locks up with these exact settings.
Also I am only getting around ~70s/it
.
So this seems to be a similar situation you have been experiencing before upgrading to ROCm 5.7
, only that I am already on that version.
Did you change anything else that might have impacted this behaviour?
The VRAM usage seems to max out at ~12.5GiB (out of 15.984 GiB), the utilization is always at 100% when running the workflow.
I don't think I did anything else to make it work. You definitely seem to have the same symptoms as I did originally, but ROCm 5.7 solved it for me.
The issue only occurs when a desktop session is running. It doesn't seem to matter if it's wayland or X11.
I was able to complete the workflow with a undesirable performance while no desktop was running, finishing the full resolution unmodified 14-frame workflow in 2974.10 seconds (thats 49.5 minutes), at about 140s/it
. 10.779 / 15.984 GIB memory usage.
Setting the AMDGPU power profile to COMPUTE didn't seem to have impact on the issue. https://wiki.archlinux.org/title/AMDGPU#Power_profiles
Currently setting up ROCm 6.0 to see if that helps.
It took some time to set up, as packaging wasn't there yet, but I was able to test this on ROCm 6.0.0
.
The issue is resolved, improving performance, which is now at 7s/it
-10s/it
and not running into the lockup.
I'm having trouble running the stable video diffusion examples on my machine.
OS: Arch linux CPU: AMD Ryzen 9 7950X RAM: 64GB GPU: AMD Radeon RX 7900 XTX VRAM: 24GB Software: ComfyUI 329c57199302f6b9ccfebb86c96e937c386da92f, Rocm 5.6... Wait. See follow-up at the end.
When I tried running the 14 frame example, it was very slow and my GPU eventually locked up.
dmesg
shows this:That was after I added the
iommu=soft
kernel parameter. Before I would seeIO_PAGE_FAULT
in the logs, among other things. I'm not sure if it's particularly interesting to see the details.The previous GPU crash (before setting
iommu=soft
) started with this:I'm also getting ~60s/it, which seems terribly slow. In regular sd-1.5 I get 10-15it/s for smaller (512x512) images, and it works great generating image after image.
If I run SVD with reduced settings, I can get through the process and produce a video, but it's still very slow.
I tried all the different cross-attention methods, tried forcing fp16 and fp32, tried highvram and disable-smart-memory. Nothing changed the speed appreciably.
Any idea what might be going on here?
[Update] Ok, when I went to write down what versions of software I was running, I noticed I had ROCm 5.6. I installed 5.7, and now I get 3-4s/it in KSampler, and the whole prompt finished in 181s. I think this is solved, but I'll submit the issue anyway, if only for the record.