Closed ConfusedMerlin closed 1 year ago
Thank you for submitting the issue.
Based on the logs from dmesg
(well syslog), it seems that the AMDGPU driver entered an error loop. Perhaps sudo rocm-smi --gpureset -d 0
can restore from it, but usually I just reboot.
I am using Kubuntu 22.04, with a lock screen timeout set to 30 minutes, but I haven't encountered this issue. I'm not sure what the cause is. It could possibly be a driver problem.
All I can do is turning it out. hard. Because no input works any more. I think I may be able to reproduce it by force by simply hitting win+l to lock the screen, leave it be for some minutes, than unlock it. But I must admit that I am not keen on freezing my system forcefully.
You should be able to SSH into it while the GPU freezes.
In the early day my RX 7900 XTX freezes a lot too, and I connect to it with another device. But it's quite stable now.
I would need another device here to do so, which isn't the case currently.
Speaking of freezes, it just did it again, just about one minute after I send that last comment. No locking, absolutely no error log anywhere, just a frozen screen and nothing to do but hard resetting again. There is always something, somehow...
EDiT: tried gpureset, just to see what to expect... screen went black, then went back on, but the xwindow session didn't want to start any more. sigh
If there is an error with the AMDGPU driver, relevant information should be included in dmesg
.
In my case, the AMDGPU driver usually works fine, but it easily enters this error state when the VRAM is exhausted (such as when running two WebUIs simultaneously for generating) and it is difficult to recover without a reboot.
Here are some troubleshooting tips that might help you:
If it is an image generation problem, I suggest using Tiled VAE and setting the Decoder Tile Size to a smaller value (such as 64 or 96). This can save a lot of VRAM and reduce the load on the GPU.
If it freezes during idle times, I can only attribute it to a driver issue and cannot provide effective assistance.
You may try export HSA_OVERRIDE_GFX_VERSION=11.0.2
and see if there is any difference.
So, I can neither reproduce it nor predict it... for now, I must close this, as I cannot provide any useful informations. I mean, it actually froze once when the SD Webui was idle in a background browser tab and I was looking through civitai for Loras about egyptian buildings. sigh
cannot reproduce, also it seems to be not caused by the webui itself, but only get encouraged to appear with a higher chance if it is running
Issue Description
So, discovered by accident. This is a new ubuntu installation, and I didn't turn off the automatic "turn off screen after x min" energy option.
So it happened at least twice, in slightly different ways.
I am certain that at least another freeze was caused by the same, general chain of events, but I didn't think to much of it, as I expected some strange stuff happen.
When that happens, the gpu keeps on fanning, but I cannot move the mouse nor use the keyboard in any way. Well, if something does happen, I cannot see it. Can't even switch to another tty. Turning off the monitor and back on does not help either.
Version Platform Description
Python 3.10.12 on Linux (ubuntu 22.04.1, 6.2.0-26-generic, rocm 5.6, amd rx 7600)
Version: 05fc2094 Sat Aug 5 01:55:50 2023 +0800
Latest published version:
88fff06c9e5ac775c7945362a6212c36a36096f5
2023-08-13T10:58:02Z
AMD ROCm toolkit detected
Relevant log output
syslog of the time... sems more useful