clbr / radeontop

GNU General Public License v3.0
789 stars 69 forks source link

System freeze since kernel 5.2 #87

Open Frechdachs opened 4 years ago

Frechdachs commented 4 years ago

System: Manjaro Linux, Ryzen 5 3500U with integrated Vega 8

When I start radeontop without any GPU utilization, it will show wrong values (e. g. 90+% utilization) and then freeze my whole system after a few seconds if I don't press q quick enough to close radeontop. I only manage to shut my system down with a hard reset if this happens. This is an example image of what radeontop displays before it freezes my system: 2019-08-11-161022_724x364_scrot

However, my system does not seem to freeze if I actually have GPU utilization. For example if I open mpv and let it play a video in the backgroud, radeontop works as expected: 2019-08-11-160835_724x364_scrot

I have tested and can reproduce this on kernel 5.2.4 and 5.3-RC2. Kernel 5.1 works without problems.

BTW, those images where taken when testing the PR with clock frequency support, but I also tested git master and the latest release with the same result.

clbr commented 4 years ago

I doubt it's a bug in radeontop if you get a full system hang, I think the kernel is more likely, especially as you say an earlier version works fine. Bisecting the kernel between those versions should help pinpoint the cause.

trek00 commented 4 years ago

Can you try to compile without amdgpu support to check if it freeze in this way too? It needs to do: make clean; make amdgpu=0 and then run radeontop as root. It could help to better diagnose the issue, that is probably a kernel bug, as clbr said.

Frechdachs commented 4 years ago

Sorry for the late reply.

Same behaviour when compiling with amdgpu=0, wrong usage and a system freeze after some time: 2019-08-26-174540_724x364_scrot

I'll try to bisect the kernel when I'm back home in two or three weeks.

ghost commented 4 years ago

Can you provide a clinfo report?

jstkdng commented 4 years ago

can confirm, my system freezes as soon as radeontop is started, then it restarts. I get this in dmesg before my pc restarts:

WARNING: CPU: 1 PID: 634 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn10/dcn10_hw_sequencer.c:854 dcn10_verify_allow_pstate_change_high.cold+0xc/0x224 [amdgpu]
Modules linked in: snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device media fuse ccm nls_iso8859_1 nls_cp437 vfat fat arc4 edac_mce_amd kvm_amd ccp rng_core ath9k kvm ath9k_common ath9k_hw ath irqbypass >
CPU: 1 PID: 634 Comm: Xorg Tainted: G        W         5.2.13-1-ck-zen #1
Hardware name: Micro-Star International Co., Ltd. MS-7B84/A320M PRO-M2 (MS-7B84), BIOS 1.7N 07/01/2019
RIP: 0010:dcn10_verify_allow_pstate_change_high.cold+0xc/0x224 [amdgpu]
Code: 83 c8 ff e9 fc 7e f9 ff 48 c7 c7 a8 f0 77 c0 e8 7f a6 9f e0 0f 0b 83 c8 ff e9 e6 7e f9 ff 48 c7 c7 a8 f0 77 c0 e8 69 a6 9f e0 <0f> 0b 80 bb 93 01 00 00 00 75 05 e9 b0 a4 f9 ff 48 8b 83 80 02 00
RSP: 0018:ffffbd0702fd7770 EFLAGS: 00010246
RAX: 0000000000000024 RBX: ffffa1d20da07000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000086 RDI: 00000000ffffffff
RBP: ffffa1d20da07000 R08: 0000000000000001 R09: 0000000000000ba4
R10: 0000000000000001 R11: 0000000000000000 R12: ffffa1d180ea81b8
R13: 0000000000000001 R14: ffffa1d180ea81b8 R15: ffffa1d180707800
FS:  00007f4295a84dc0(0000) GS:ffffa1d210440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b7c2b1ca08 CR3: 000000038b0de000 CR4: 00000000003406e0
Call Trace:
 dcn10_pipe_control_lock.part.0+0x69/0x70 [amdgpu]
 dc_stream_set_cursor_attributes+0xef/0x170 [amdgpu]
 handle_cursor_update.isra.0+0x1f6/0x350 [amdgpu]
 amdgpu_dm_commit_cursors.isra.0+0x59/0x70 [amdgpu]
 amdgpu_dm_atomic_commit_tail+0x11dc/0x1950 [amdgpu]
 ? commit_tail+0x3c/0x70 [drm_kms_helper]
 commit_tail+0x3c/0x70 [drm_kms_helper]
 drm_atomic_helper_commit+0x10c/0x120 [drm_kms_helper]
 drm_atomic_helper_update_plane+0xec/0x110 [drm_kms_helper]
 drm_mode_cursor_universal+0x12c/0x240 [drm]
 ? _raw_spin_lock+0x13/0x30
 drm_mode_cursor_common+0xde/0x230 [drm]
 ? import_iovec+0x5d/0xc0
 ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
 drm_ioctl_kernel+0xb8/0x100 [drm]
 drm_ioctl+0x221/0x3b0 [drm]
 ? drm_mode_cursor_ioctl+0x70/0x70 [drm]
 amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
 do_vfs_ioctl+0x42c/0x6b0
 ksys_ioctl+0x5e/0x90
 __x64_sys_ioctl+0x16/0x20
 do_syscall_64+0x5f/0x1d0
 ? prepare_exit_to_usermode+0x85/0xb0
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x7f4296eba21b
Code: 0f 1e fa 48 8b 05 75 8c 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 45 8c 0c 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe73146bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffe73146c20 RCX: 00007f4296eba21b
RDX: 00007ffe73146c20 RSI: 00000000c02464bb RDI: 000000000000000a
RBP: 00000000c02464bb R08: 0000000000000001 R09: 0000000000003fff
R10: 000000000000007f R11: 0000000000000246 R12: 000000000000000b
R13: 000000000000000a R14: 000000000000000c R15: 000055d3978d9bc0
vinibali commented 4 years ago

hello there, i experience more or less the same. my system freezes just at the time i open up radeontop. with:

best regards

clst commented 4 years ago

The same happens on my Acer Nitro with Ryzen 5 2500U Raven + Polaris RX 560

Radeontop works fine on 4.19 but not on 5.2 or 5.3

grische commented 4 years ago

@jstkdng @vinibali @clst has anyone of you reported the bug against upstream/kernel? Could you link a bug report?

userofryzen commented 4 years ago

for me with a ryzen 3550H+rx 560 is happening, I don't know how to bisect the problem for reporting it to the kernel. But I am sure any of the previous users had did, no? I have no other kernels to test because pre 5.0 don't work here

jstkdng commented 4 years ago

same here, picasso is not supported in old kernels. Raven Ridge did work though.

luyatshimbalanga commented 4 years ago

The issue also affects HP Envy x360 Ryzen 2500u.

bezirg commented 4 years ago

I got a system freeze with 3400g, kernel 5.3. The next time I hard reboot it , radeontop, radeontop --help , etc. always exit with exit code 0

grische commented 4 years ago

@luyatshimbalanga @bezirg did you report the bug upstream against the kernel as clbr suggested?

luyatshimbalanga commented 4 years ago

@grische Filed

userofryzen commented 4 years ago

https://bugzilla.kernel.org/show_bug.cgi?id=205497

jstkdng commented 4 years ago

Just updated to kernel 5.4 It no longer freezes for me. image

luyatshimbalanga commented 4 years ago

I can confirm kernel 5.4 fixed the freeze issue. Screenshot from 2019-11-29 17-07-35

clst commented 4 years ago

It is indeed perfect now. Thanks a lot to AMD's Alex Deucher.

grische commented 4 years ago

@clst I am not sure Alex Deucher is reading this thread here.

@clbr @Frechdachs I would expect this means we can close this bug and mark it as "Not our bug".

clbr commented 4 years ago

I'd like to leave it open for a bit, so that users with those kernels coming to report it will see it. Putting a warning in the README etc would not have the same reach.

clst commented 4 years ago

@clst I am not sure Alex Deucher is reading this thread here.

Pretty sure he won't but I didn't want to spam the kernel bugzilla with "Thanks, me too" replies he then has to sift through instead of making amdgpu even more awesome. :)

userofryzen commented 4 years ago

5.4.6-2-MANJARO and I am having this problem yet. have tested right now 2 times, 2 consecutive freezes. Picasso apu here

trek00 commented 4 years ago

On Fri, 10 Jan 2020 17:43:32 -0800 userofryzen notifications@github.com wrote:

5.4.6-2-MANJARO and I am having this problem yet. have tested right now 2 times, 2 consecutive freezes. Picasso apu here

are you running radeontop with the --mem option?

if it freeze with --mem, this is expected as the kernel driver can't know when to power on the chip

ciao!

userofryzen commented 4 years ago

On Fri, 10 Jan 2020 17:43:32 -0800 userofryzen @.***> wrote: 5.4.6-2-MANJARO and I am having this problem yet. have tested right now 2 times, 2 consecutive freezes. Picasso apu here are you running radeontop with the --mem option? if it freeze with --mem, this is expected as the kernel driver can't know when to power on the chip ciao!

No. It happened to me with the simplest way to do so. Radeontop in root or with sudo and the number of bus because I have 2 graphics . When the integrated is not being used the behaviour is the same as before. I have only run the tool with mem when I had to try the vcn and uvd patched version (that didn't work for me)

trek00 commented 4 years ago

please open a new bug report to the Linux kernel, this is likely a bug in the driver or firmware, because radeontop without the --mem option simply calls the driver to get the data

ciao!

AO-LocLab commented 3 years ago

Ubuntu 20.10 Groovy dev version. Kernel 5.8.0-18-generic. AMD Ryzen 3 3200G with AMD Radeon Vega 8 integrated graphics (Raven, DRM 3.38.0, 5.8.0-18-generic, LLVM 10.0.1) The whole system hangs when I try to 'sudo radeontop'.

grische commented 3 years ago

@AO-LocLab as the previous users, I recommend you to open a bug report against upstream kernel. This is unlikely related to the program.

AO-LocLab commented 3 years ago

Is there a chance that a bug report against the upstream kernel related to radeontop will ever get picked up by their maintainers?

grische commented 3 years ago

Is there a chance that a bug report against the upstream kernel related to radeontop will ever get picked up by their maintainers?

You don't know if you haven't tried ;-)

The previous one (see this thread further up) has been picked up in a matter of hours and fixed very quickly.

AO-LocLab commented 3 years ago

Very well, I’ll do it. Could someone help me with gathering relevant information to create the kernel bug report? I wouldn’t know where to start :-)

yaohao0814 commented 3 years ago

Ubuntu 20.04 LTS Kernel 5.4.0-48-generic AMD Ryzen 3 3200G The system froze, even light on the mouse went off. I could only push the reset button.

trek00 commented 3 years ago

to better understand where the issue lies, radeontop must use the DRM path, because the direct memory access path is not controlled by the kernel and it could lead to freezing

so, update radeontop to the latest git version, then run it as a normal user, not via sudo, or run it with the --path argument, to be sure radeontop is using the DRM path

also, you should update the kernel to the latest version or at least to the 5.4 version and you should update the motherboard and gpu firmware too

if it stills to freeze, opening a kernel bug is probably the best way to get it fixed

ciao!

B83C commented 1 year ago

Happening here on 6.1.0-rc4 gentoo with 6850u in T14s gen 3. Here, the system froze before the TUI shows up. Dmesg here : dmesg.txt

danielzgtg commented 1 year ago

Does adding amdgpu.runpm=0 to GRUB fix it for you?

B83C commented 1 year ago

Unfortunately, no. image