M-Bab / linux-kernel-amdgpu-binaries

Kernel binaries (amd64) of amd-staging with DAL and latest security patches
214 stars 29 forks source link

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* #48

Closed CRTX closed 6 years ago

CRTX commented 6 years ago

According to this https://bugs.freedesktop.org/show_bug.cgi?id=104289 this bug has been fixed.

However, I'm still getting it.

Do these binaries not have the fix for this particular bug also referenced in the link?

I have an HP Envy 15z x360 with a 2500U with Vega Graphics

This is my crash log from /var/log/kern.log

Mar  7 12:38:17 dnbenvy kernel: [    1.869449] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:04:00.0 on minor 0
Mar  7 13:15:00 dnbenvy kernel: [    1.166118] [drm] amdgpu kernel modesetting enabled.
Mar  7 13:15:00 dnbenvy kernel: [    1.171737] fb: switching to amdgpudrmfb from EFI VGA
Mar  7 13:15:00 dnbenvy kernel: [    1.171921] amdgpu 0000:04:00.0: enabling device (0006 -> 0007)
Mar  7 13:15:00 dnbenvy kernel: [    1.172203] [drm] add ip block number 4 <amdgpu_powerplay>
Mar  7 13:15:00 dnbenvy kernel: [    1.172569] amdgpu 0000:04:00.0: VRAM: 256M 0x000000F400000000 - 0x000000F40FFFFFFF (256M used)
Mar  7 13:15:00 dnbenvy kernel: [    1.172570] amdgpu 0000:04:00.0: GTT: 1024M 0x000000F500000000 - 0x000000F53FFFFFFF
Mar  7 13:15:00 dnbenvy kernel: [    1.172716] [drm] amdgpu: 256M of VRAM memory ready
Mar  7 13:15:00 dnbenvy kernel: [    1.172718] [drm] amdgpu: 3072M of GTT memory ready.
Mar  7 13:15:00 dnbenvy kernel: [    1.813886] [drm:construct [amdgpu]] *ERROR* construct: Invalid Connector ObjectID from Adapter Service for connector index:2! type 0 expected 3
Mar  7 13:15:00 dnbenvy kernel: [    1.869093] fbcon: amdgpudrmfb (fb0) is primary device
Mar  7 13:15:00 dnbenvy kernel: [    1.869252] amdgpu 0000:04:00.0: fb0: amdgpudrmfb frame buffer device
Mar  7 13:15:00 dnbenvy kernel: [    1.892170] amdgpu 0000:04:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892172] amdgpu 0000:04:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892174] amdgpu 0000:04:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892175] amdgpu 0000:04:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892176] amdgpu 0000:04:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892177] amdgpu 0000:04:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892179] amdgpu 0000:04:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892181] amdgpu 0000:04:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892182] amdgpu 0000:04:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892183] amdgpu 0000:04:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
Mar  7 13:15:00 dnbenvy kernel: [    1.892185] amdgpu 0000:04:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
Mar  7 13:15:00 dnbenvy kernel: [    1.892187] amdgpu 0000:04:00.0: ring 11(vcn_dec) uses VM inv eng 5 on hub 1
Mar  7 13:15:00 dnbenvy kernel: [    1.892188] amdgpu 0000:04:00.0: ring 12(vcn_enc0) uses VM inv eng 6 on hub 1
Mar  7 13:15:00 dnbenvy kernel: [    1.892203] amdgpu 0000:04:00.0: ring 13(vcn_enc1) uses VM inv eng 7 on hub 1
Mar  7 13:15:00 dnbenvy kernel: [    1.897036] [drm] Initialized amdgpu 3.25.0 20150101 for 0000:04:00.0 on minor 0
Mar  7 13:37:17 dnbenvy kernel: [ 1548.395699] amdgpu 0000:04:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:7 pasid:32793)
Mar  7 13:37:17 dnbenvy kernel: [ 1548.395708] amdgpu 0000:04:00.0:   at page 0x00000001b9000000 from 27
Mar  7 13:37:17 dnbenvy kernel: [ 1548.395711] amdgpu 0000:04:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070153C
Mar  7 13:37:28 dnbenvy kernel: [ 1558.611111] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=51894, last emitted seq=51896
M-Bab commented 6 years ago

Jumped to kernel 4.16 rc4. You can retry if it helps there were a lots of fixes in amd-staging-drm-next.

CRTX commented 6 years ago

Darn it, still getting it. Happens even quicker if I run a 3D application. (Rocket League)

I only got two lines when it crashed though.

$ uname -r 4.16.0-rc4+

:(

Mar  7 23:36:18 dnbenvy kernel: [  421.548968] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=42083, last emitted seq=42084
Mar  7 23:36:18 dnbenvy kernel: [  421.548977] [drm] No hardware hang detected. Did some blocks stall?

Also relatively minor but also want to report the picture's darker (not the brightness). As if the gamma? was turned down by quite a bit.

P.S. That issue aside, at least as long as I'm not running 3D applications the OS doesn't crash randomly anymore. I can finally use my shiny new blazing fast laptop for work! I think. At least so far... Fingers crossed!

P.P.S Full screening videos also make it crash it sometimes. Here's to hoping for a fix soon ;( But still no random crashes yet after 8 straight hours of usage. Yay!

CRTX commented 6 years ago

Would you mind pushing RC5 now that it's out? There seems to be also some wifi fixes that has fixed some wifi issues as well

CRTX commented 6 years ago

Sigh... still freezes on 3D applications. How unfortunate.

Mar 23 11:10:06 dnbenvy kernel: [  180.923577] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=19210, last emitted seq=19212
Mar 23 11:10:06 dnbenvy kernel: [  180.923587] [drm] No hardware hang detected. Did some blocks stall?

Good news is RC6 fixes the brightness again! Thanks for being prompt on the updating the debs with the latest RCs.

Do you know where is the proper channel to report this bug?

M-Bab commented 6 years ago

Sorry to hear that. You can keep trying while being aware that this kernel is bleeding edge and there is quite a chance that it causes issues. If you want to report the bug upstream it is best to use the freedesktop bugzilla: https://bugs.freedesktop.org/ Or the amd developers mailing list: amd-gfx@lists.freedesktop.org I recommend linking to this issue that they have all the information.

CRTX commented 6 years ago

This seems to be fixed in 4.17-rc1. I did run into a random freeze but it wasn't due to this error since I wasn't able to find anything related to it in the kernel logs

madmax2012 commented 6 years ago

I have just had the same issue again, running 4.17-rc5, would therefore not consider it closed

mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a02000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a01000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a9a000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a06000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a98000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a2e000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a04000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102aa1000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a0c000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32769) mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: at page 0x0000000102a2c000 from 27 mei 28 16:47:43 Annibuntu kernel: amdgpu 0000:09:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031

saisankargochhayat commented 6 years ago

Same here Jun 15 17:37:37 sai-pc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=139689, last emitted seq=139692

marvind commented 6 years ago

Since I have upgraded to the latest firmware (https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=1db3eec2ec895548f5349233dbf06013c9c19286) I did not see this issue anymore on 2500U. You might want to try it. Also see: https://bugs.freedesktop.org/show_bug.cgi?id=105251#c10

M-Bab commented 6 years ago

I don't get it - which one is actually newer now/working better? Mine or the one from the link?

marvind commented 6 years ago

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/amdgpu is the newest firmware I could find. It is newer than your firmware-radeon-ucode_1.90_all.deb, not sure about firmware-radeon-ucode_2.00_all.deb.

madmax2012 commented 6 years ago

I have installed the Radeon Software 18.20, which includes the firmare linked above, as suggested by Alex's commit message ( and confirmed by checksums) Before glxgears made the PC crash, running at max 60fps Afterwards glxgear the crashes are not occuring, rinnung at 4000fps I will test out the current release and compare next week https://support.amd.com/en-us/kb-articles/Pages/Radeon-Software-for-Linux-Release-Notes.aspx

splace commented 6 years ago

i'm been getting this from a clean install of solus 3, ~4 months ago, through many updates of kernel and xorg and mesa, still the same;

other distros with older kernel etc. uneffected

xorg freezes randomly, (2-8hr maybe) seems to be always when under high load of some kind, not specific to app running.

[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=161664, last emitted seq=161666
[drm] No hardware hang detected. Did some blocks stall?

kernel 4.17.6-82.current gpu radeon r7 250

M-Bab commented 6 years ago

Okay my firmware package is also upgraded to use these latest binaries.