M-Bab / linux-kernel-amdgpu-binaries

Kernel binaries (amd64) of amd-staging with DAL and latest security patches
214 stars 29 forks source link

screen output wrong in 4.19 #74

Closed stuaxo closed 5 years ago

stuaxo commented 5 years ago

I installed 4.19 from 4.18.8, but when I boot the screen looks like this, blank with a single glitchy line of pixels:

https://photos.app.goo.gl/UAF3EFjsCbjt1pTy7

(excuse my reflection)

stuaxo commented 5 years ago

Booting in recovery mode works.

Everything was pretty stable for me up to a few releases ago but I'm back to things freezing is I run any demanding graphics.

stuaxo commented 5 years ago

Here's some logging:

[ 1667.269364] [drm:generic_reg_wait [amdgpu]] *ERROR* REG_WAIT timeout 1us * 10 tries - optc1_lock line:628
[ 1667.269515] WARNING: CPU: 1 PID: 4323 at drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.c:254 generic_reg_wait+0xe2/0x160 [amdgpu]
[ 1667.269517] Modules linked in: veth nf_conntrack_netlink xt_nat nfnetlink xfrm_user xfrm_algo xt_addrtype overlay aufs vmw_vsock_vmci_transport vsock vmw_vmci ccm xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_
ipv4 xt_multiport xt_tcpudp hidp devlink ebtable_filter ebtables bpfilter cmac bnep binfmt_misc nls_iso8859_1 edac_mce_amd kvm_amd ccp kvm snd_hda_codec_realtek snd_hda_codec_generic irqbypass hid_sensor_magn_3d snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul hid_sensor_gyro_3d hid_sensor_incl_3d hid_sensor_rotation 
ghash_clmulni_intel hid_sensor_accel_3d snd_hda_intel pcbc hid_sensor_trigger snd_hda_codec industrialio_triggered_buffer kfifo_buf uvcvideo snd_hda_core hid_sensor_iio_common
[ 1667.269565]  industrialio videobuf2_vmalloc snd_hwdep videobuf2_memops videobuf2_v4l2 videobuf2_common snd_pcm btusb aesni_intel videodev btrtl aes_x86_64 btbcm media crypto_simd btintel cryptd glue_helper bluetooth snd_seq_midi snd_seq_midi_event ecdh_generic arc4 snd_rawmidi snd_seq r8822be(CE) input_leds joydev 
serio_raw hp_wmi snd_seq_device sparse_keymap wmi_bmof mac80211 snd_timer k10temp snd rtsx_pci_ms soundcore cfg80211 memstick hp_accel lis3lv02d input_polldev hp_wireless mac_hid sch_fq_codel iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables parport_pc ppdev lp parport ip_tables x_tables
 autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_sensor_custom hid_sensor_hub hid_generic usbhid raid6_pq libcrc32c
[ 1667.269631]  raid1 raid0 multipath linear amdkfd amd_iommu_v2 amdgpu chash i2c_algo_bit gpu_sched ttm rtsx_pci_sdmmc drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm nvme psmouse ahci libahci i2c_piix4 rtsx_pci nvme_core wmi i2c_hid video hid i2c_scmi
[ 1667.269662] CPU: 1 PID: 4323 Comm: gnome-shell Tainted: G        WC  E     4.18.8 #1
[ 1667.269663] Hardware name: HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.17 03/29/2018
[ 1667.269696] RIP: 0010:generic_reg_wait+0xe2/0x160 [amdgpu]
[ 1667.269697] Code: ab 44 8b 45 20 48 8b 4d 18 44 89 e6 8b 55 10 48 c7 c7 b8 ac 51 c0 44 89 4d d4 e8 a9 e7 de ff 41 83 7d 18 01 44 8b 4d d4 74 02 <0f> 0b 48 83 c4 18 44 89 c8 5b 41 5c 41 5d 41 5e 41 5f 5d c3 41 0f 
[ 1667.269725] RSP: 0018:ffffb6f70925f8a0 EFLAGS: 00010297
[ 1667.269727] RAX: 0000000000000000 RBX: 000000000000000b RCX: 0000000000000000
[ 1667.269728] RDX: 0000000000000000 RSI: ffff9bf97ea564b8 RDI: ffff9bf97ea564b8
[ 1667.269729] RBP: ffffb6f70925f8e0 R08: 0000000000000550 R09: 0000000000000001
[ 1667.269729] R10: 0000000000000002 R11: ffffffffb6181f4d R12: 0000000000000001
[ 1667.269730] R13: ffff9bf96f3b3900 R14: 0000000000000100 R15: 0000000000000001
[ 1667.269732] FS:  00007efcea216ac0(0000) GS:ffff9bf97ea40000(0000) knlGS:0000000000000000
[ 1667.269733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1667.269734] CR2: 0000259aa1d79a38 CR3: 00000007c65f2000 CR4: 00000000003406e0
[ 1667.269735] Call Trace:
[ 1667.269795]  optc1_lock+0xa2/0xb0 [amdgpu]
[ 1667.269831]  dcn10_pipe_control_lock.part.25+0x31/0x70 [amdgpu]
[ 1667.269876]  dcn10_apply_ctx_for_surface+0xcc/0x3f0 [amdgpu]
[ 1667.269906]  dc_commit_state+0x289/0x570 [amdgpu]
[ 1667.269919]  ? drm_calc_timestamping_constants+0xff/0x160 [drm]
[ 1667.269962]  amdgpu_dm_atomic_commit_tail+0x299/0xdf0 [amdgpu]
[ 1667.269988]  ? amdgpu_bo_pin_restricted+0x67/0x280 [amdgpu]
[ 1667.269996]  ? _cond_resched+0x19/0x40
[ 1667.269998]  ? wait_for_completion_timeout+0x38/0x140
[ 1667.270004]  ? _cond_resched+0x19/0x40
[ 1667.270007]  ? wait_for_completion_interruptible+0x35/0x180
[ 1667.270042]  ? dm_plane_helper_prepare_fb+0x102/0x310 [amdgpu]
[ 1667.270050]  commit_tail+0x42/0x70 [drm_kms_helper]
[ 1667.270055]  drm_atomic_helper_commit+0x10c/0x120 [drm_kms_helper]
[ 1667.270088]  amdgpu_dm_atomic_commit+0x87/0xa0 [amdgpu]
[ 1667.270097]  drm_atomic_commit+0x4a/0x50 [drm]
[ 1667.270106]  drm_atomic_connector_commit_dpms+0xef/0x100 [drm]
[ 1667.270115]  drm_mode_obj_set_property_ioctl+0x176/0x280 [drm]
[ 1667.270124]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[ 1667.270132]  drm_ioctl_kernel+0xa4/0xf0 [drm]
[ 1667.270140]  drm_ioctl+0x37b/0x440 [drm]
[ 1667.270147]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[ 1667.270168]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 1667.270174]  do_vfs_ioctl+0xa8/0x630
[ 1667.270178]  ? vfs_read+0x115/0x130
[ 1667.270179]  ? vfs_read+0x115/0x130
[ 1667.270182]  ksys_ioctl+0x75/0x80
[ 1667.270185]  __x64_sys_ioctl+0x1a/0x20
[ 1667.270189]  do_syscall_64+0x5a/0x120
[ 1667.270193]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1667.270196] RIP: 0033:0x7efce72f15d7
[ 1667.270196] Code: b3 66 90 48 8b 05 b1 48 2d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 81 48 2d 00 f7 d8 64 89 01 48 
[ 1667.270227] RSP: 002b:00007ffdc83ced58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1667.270228] RAX: ffffffffffffffda RBX: 0000561d5bb91180 RCX: 00007efce72f15d7
[ 1667.270229] RDX: 00007ffdc83ced90 RSI: 00000000c01864ba RDI: 000000000000000b
[ 1667.270230] RBP: 00007ffdc83ced90 R08: 0000000000000000 R09: 0000000000000002
[ 1667.270231] R10: 0000561d5bb8dbd8 R11: 0000000000000246 R12: 00000000c01864ba
[ 1667.270231] R13: 000000000000000b R14: 00007ffdc83cef90 R15: 00007efce9631d50
[ 1667.270234] ---[ end trace ec720aa8ab71aa32 ]---
CRTX commented 5 years ago

can't help if we don't know anything about your system

stuaxo commented 5 years ago

Doh,

sorry it's an HP x360 15", bq101na

Ryzen 2500u 32gb ram

Ubuntu 64 bit

CRTX commented 5 years ago

This happened to me when the kernel was corrupted. These are the steps I did to fix it.

Good luck. But if you must know the random crashing is still not fixed even in 4.19. My laptop almost always crashes with high I/O load in 4.19. There are a huge amount of fixes in 4.20 so if you can compile it or you can wait for the mainline .deb from http://kernel.ubuntu.com/~kernel-ppa/mainline/ whenever it's updated again. (later this week or next Monday)

stuaxo commented 5 years ago

I saw 4.19.1 is available in this repo and tried re-installing

sudo dpkg -i linux-headers-4.19.1_18.11.06.amdgpu.ubuntu_amd64.deb linux-image-4.19.1_18.11.06.amdgpu.ubuntu_amd64.deb firmware-radeon-ucode_2.20_all.deb

However 4.19.0 is still the newest version showing in grub on boot.

/etc/default/grub has this commandline:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash idle=nomwait amdgpu.audio=0 amdgpu.dc=1"

I'll definitely be trying 4.20 when it arrives. I don't seem to have crashes on high IO (have used this as a dev laptop for a few months now), but graphics pretty reliably crash things.

anbox is my goto for testing crashyness - if my config isn't rock solid, anbox will crash it almost instantly, also games under steam (e.g. Thumper, the windows version under Wine).

chrismonks commented 5 years ago

same laptop as stuaxo: HP HP ENVY x360 Convertible 15-bq1xx/83C6, BIOS F.17

using 4.19.1 on ubuntu 18.10 having the same issue on boot. Weird screen output exact same as picture from above, any ideas?

stuaxo commented 5 years ago

Not sure how I did it, but I tried booting with HDMI plugged in and it said something like

The initrd is too big
Press any key to reset

The reason I'm not sure, is the next few times I rebooted it showed the same output as before, I did notice that pressing a key reboots, so I think it is trying to display this message.

M-Bab commented 5 years ago

Can you try again with 4.19.2 or the 4.20 pre-release kernel?

stuaxo commented 5 years ago

Both versions seem to stop booting almost immediately, showing the glitchy line, just like 4.19 and 4.19.1.

I can probably take some time tomorrow, with it plugged in via HDMI and see if it gives me the too big message.

stuaxo commented 5 years ago

On 4.20rc I tried amdgpu.ip_block_mask=0xff which made no difference.

Then I tried modprobe.blacklist=amdgpu and can boot (though obviously in low res).

Running modprobe amdgpu doesn't seem to load the module, it also temporarily displayed the same glitch at the bottom of the screen as seen in the screenshots above.

Here's the log from dmesg https://gist.github.com/stuaxo/2200b2407161ddf2f99e9e9c4c327aa0

The last line:

[ 41.497812] amdgpu: probe of 0000:04:00.0 failed with error -22

Is where I ran modprobe amdgpu

stuaxo commented 5 years ago

A quick update on the latest versions- everything seems the same.

The message from the rc version says:

AMD Vi: Unable to write to IOMMU perf counter.
tpm_crb MSFT0101:00: can't request region for resource [mem 0xcd579000-0xcd579fff]

image

stuaxo commented 5 years ago

Update, pretty sure the tpm message doesn't stop booting, as I managed to boot the Ubuntu 18.10 Live version.

CRTX commented 5 years ago

There may be a regression in 4.19.1+. I'm using 4.19.0-041900-generic from the mainline website and I'm able to boot just fine. The only other one's I've tried is 4.20rc1 through 4.20-rc6. I'm guessing 4.19.1 up to 4.20-rc6 have this bug with the glitchy line and not being able to boot on 2500U processors

stuaxo commented 5 years ago

What's the best way to find out exactly where the bug is - I guess install onto a USB stick and try some kernels.

And then, where and who to report the bug to ?

chrismonks commented 5 years ago

i ended up using 4.19.4 from ubuntu ppa http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=M;O=D . its working really well. I tried their 4.20.rc3 and got the same weird screen output. so its not just this kernel, I think its mailine that has the issues. There are alot of amdgpu changes in 4.20.xx kernels.

also a recent bios update to F.19 on this laptop sorted out the suspend issues i've been having. Also the touchscreen is working on 4.19.4+ which is not as a result of this patch https://bugzilla.kernel.org/attachment.cgi?id=279539 that accepted to mainline

stuaxo commented 5 years ago

How is stability?

Thumper (the windows version through Steam, running in DxDx9 mode) or anbox are my gotos that can make it crash

chrismonks commented 5 years ago

anbox working fine for me in ubuntu. haven't tried thumper in windows. Although stability in windows was very poor unitl i force installed Adrenalin-Edition-18.12.1-Nov29 drivers selecting "Vega RX" using this guide https://www.reddit.com/r/Amd/comments/9zizs0/manual_installing_up_to_date_vega_driver_on_2500u/

stuaxo commented 5 years ago

I meant the Windows version of Thumper, but in Linux, using it's built in Wine support. https://www.engadget.com/2018/08/22/steam-play-linux-windows-games-compatible/

I don't really boot into Windows very often, when I do it seems OK, but getting the drivers there was a bit of a pain (I was able to play the demo of Forza 4 on a low res setting)

stuaxo commented 5 years ago

OK, 4.19.4 from the ppa is the most stable version I've tried in a long time (doesn't seem to crash with Thumper so far).

I haven't been able to get dkms working, since it says it's incompatible. The touchscreen still doesn't work, did you do anything to help with that ?

I'd like to try 4.19.4 from this repo, but I can't see a commit to grab it from here.

stuaxo commented 5 years ago

Upgraded to F19 bios and everything seems to be working !

I'm pretty shocked that even the touch screen works now.

chrismonks commented 5 years ago

when you say everything - what kernel, from this github or the one from ubuntu ppa?

stuaxo commented 5 years ago

I've been using 4.19.4 from the ubuntu for a few days, and today I tried

4.19.6 from this github, 4.20.0rc5

I think I noticed the small screen glitch I screenshotted in this ticket go by very fast, but boot continues.

One of the kernels (I think 4.19.6) seems to have a tiny issue where the screen brightness is set so the screen appears off when I come back to the laptop, but I just turn up the brightness.

Time will tell whether the laptop comes back after being left for a long time (hibernates I guess) - this has always been a bit iffy (I've just come to expect theres a 50% chance it is locked up and REISUB is needed).

I'm currently installing 4.20rc6 from here. I should find out if there is any instability just by regular usage.

chrismonks commented 5 years ago

Great news. It seems F.19 bios really fixed a lot of things. Sleep and Hibernate are working perfectly for me now as well. BTW 4.19.5+ kernels have the touchscreen patch already applied as it was committed upstream, which is very nice indeed. looking forward to testing this kernel now along with the AMDGPU improvements. win win. Thanks for the help