TheRemote / Ubuntu-Server-raspi4-unofficial

Ubuntu Server 18.04.4 Raspberry Pi 4 Image + Build Script (unofficial)
https://jamesachambers.com/raspberry-pi-4-ubuntu-server-desktop-18-04-3-image-unofficial/
Apache License 2.0
536 stars 93 forks source link

Stuck/kernel paniced boots at random with likely v3d conflicts, i.e. "IRQ request failed on -22" [v28] #121

Closed pseudofunizer closed 2 years ago

pseudofunizer commented 4 years ago

Stuck/kernel paniced boots at random with likely v3d conflicts, i.e. "IRQ request failed on -22" [v28]

Hi James!

The Ubuntu Server 64bit has been working excellently for months up until now - good times. However, apparently some of the recent cutting-edge driver updates for the RPi4B's internal chips have caused things to go haywire very often during the boot-up process.

Strangely enough, and on somewhat of a bright side, the issue seems to resolve itself by unplugging the power and rebooting the Pi - usually after a 1-3 reboot attempts, I'm back into the OS and it's running again without any problems at all and everything is working as if nothing had happened.

The kernel panics/stuck boots occur during the boot sequence when Ubuntu is loading up and the Xubuntu desktop hasn't started up yet. I've tried to replicate this problem on two up-to-date RPi4B units and it persists in both of them.

The issue has started to occur quite recently. During both cold boots and soft reboots when the actual Ubuntu's side is loading up, I get stuck with (usually) either a kernel paging request error or an IRQ request failing at -22.

I was wondering whether that could be related to the task prioritization in this particular Ubuntu Server fork that's been done back in the day to get the audio working without crackling?

The problem seems to revolve around the v3d modules and drivers for the onboard chips. It's gotten worse with recent driver updates, so... Here's a snippet of the dmesg error messages I've been getting (note also the nags about the tainted kernel -- could it be one of the culprits that since there's already 5.4.y kernel tree being pushed out there, and from what I've read, for Ubuntu 18.04.4 LTS as well, that there's probably "a bit of a mismatch" now..?)

[    7.159573] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    7.170271] Unable to handle kernel paging request at virtual address 40d2867fe0d51d64
[    7.172367] Mem abort info:
[    7.174130]   ESR = 0x96000004
[    7.176347]   Exception class = DABT (current EL), IL = 32 bits
[    7.178441]   SET = 0, FnV = 0
[    7.180526]   EA = 0, S1PTW = 0
[    7.183649] Data abort info:
[    7.185405]   ISV = 0, ISS = 0x00000004
[    7.187091]   CM = 0, WnR = 0
[    7.188536] [40d2867fe0d51d64] address between user and kernel address ranges

...

[   7.435813] [drm] Setting vblank_disable_immediate to false because get_vblank_timestamp == NULL
[    7.448853] fbcon_init: detected unhandled fb_set_par error, error code -16
[    7.448867] Console: switching to colour frame buffer device 240x75
[    7.455864] random: crng init done
[    7.455871] random: 7 urandom warning(s) missed due to ratelimiting
[    7.465258] vc4-drm soc:gpu: fb0: DRM emulated frame buffer device
[    7.468205] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
[    7.480779] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Mar  1 2015 07:29:38 version 7.45.18 (r538002) FWID 01-6a2c8ad4
[    8.581743] audit: type=1400 audit(1517155101.687:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/sbin/dhclient" pid=528 comm="apparmor_parser"
[    8.584947] audit: type=1400 audit(1517155101.691:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-client.action" pid=528 comm="apparmor_parser"
[    8.584979] audit: type=1400 audit(1517155101.691:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=528 comm="apparmor_parser"
[    8.585004] audit: type=1400 audit(1517155101.691:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=528 comm="apparmor_parser"
[    8.587696] audit: type=1400 audit(1517155101.695:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default" pid=527 comm="apparmor_parser"
[    8.594667] audit: type=1400 audit(1517155101.695:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-cgns" pid=527 comm="apparmor_parser"
[    8.596717] audit: type=1400 audit(1517155101.695:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-mounting" pid=527 comm="apparmor_parser"
[    8.596728] audit: type=1400 audit(1517155101.695:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lxc-container-default-with-nesting" pid=527 comm="apparmor_parser"
[    8.597238] audit: type=1400 audit(1517155101.703:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/lightdm/lightdm-guest-session" pid=526 comm="apparmor_parser"
[    8.604171] audit: type=1400 audit(1517155101.707:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/lightdm/lightdm-guest-session//chromium" pid=526 comm="apparmor_parser"
[    8.639656] Unable to handle kernel paging request at virtual address 0000000000040006
[    8.643237] Mem abort info:
[    8.643244]   ESR = 0x96000005
[    8.643248]   Exception class = DABT (current EL), IL = 32 bits
[    8.643251]   SET = 0, FnV = 0
[    8.643253]   EA = 0, S1PTW = 0
[    8.643255] Data abort info:
[    8.643258]   ISV = 0, ISS = 0x00000005
[    8.643260]   CM = 0, WnR = 0
[    8.643266] user pgtable: 4k pages, 39-bit VAs, pgdp = 000000007c878458
[    8.643269] [0000000000040006] pgd=0000000000000000, pud=0000000000000000
[    8.643277] Internal error: Oops: 96000005 [#2] PREEMPT SMP
[    8.643281] Modules linked in: vc4 drm_kms_helper cec snd_soc_core snd_bcm2835(C) snd_compress snd_pcm_dmaengine snd_pcm evdev brcmfmac sg brcmutil snd_timer cfg80211 v3d gpu_sched rfkill snd bcm2835_codec(C) drm v4l2_mem2mem syscopyarea bcm2835_v4l2(C) sysfillrect sysimgblt bcm2835_mmal_vchiq(C) fb_sys_fops v4l2_common videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops drm_panel_orientation_quirks videobuf2_v4l2 raspberrypi_hwmon videobuf2_common hwmon videodev media vc_sm_cma(C) rpivid_mem uio_pdrv_genirq uio sch_fq_codel cuse fuse i2c_dev ip_tables x_tables ipv6 crc_ccitt
[    8.643340] Process apparmor_parser (pid: 542, stack limit = 0x00000000f61fd2cb)
[    8.643345] CPU: 1 PID: 542 Comm: apparmor_parser Tainted: G      D  C        4.19.105-v8-28 #28
[    8.643347] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[    8.643349] pstate: 60000005 (nZCv daif -PAN -UAO)

...

[   17.140820] Unable to handle kernel paging request at virtual address 0000000000040006
[   17.143159] Mem abort info:
[   17.145237]   ESR = 0x96000005
[   17.164806]   Exception class = DABT (current EL), IL = 32 bits
[   17.167622]   SET = 0, FnV = 0
[   17.169921]   EA = 0, S1PTW = 0
[   17.175169] Data abort info:
[   17.187479]   ISV = 0, ISS = 0x00000005
[   17.191719]   CM = 0, WnR = 0
[   17.200108] user pgtable: 4k pages, 39-bit VAs, pgdp = 0000000055c1f198
[   17.202961] [0000000000040006] pgd=0000000000000000, pud=0000000000000000
[   17.209837] Internal error: Oops: 96000005 [#6] PREEMPT SMP
[   17.211864] Modules linked in: bridge stp llc iptable_filter overlay bluetooth ecdh_generic uinput vc4 drm_kms_helper cec snd_soc_core snd_bcm2835(C) snd_compress snd_pcm_dmaengine snd_pcm evdev brcmfmac sg brcmutil snd_timer cfg80211 v3d gpu_sched rfkill snd bcm2835_codec(C) drm v4l2_mem2mem syscopyarea bcm2835_v4l2(C) sysfillrect sysimgblt bcm2835_mmal_vchiq(C) fb_sys_fops v4l2_common videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops drm_panel_orientation_quirks videobuf2_v4l2 raspberrypi_hwmon videobuf2_common hwmon videodev media vc_sm_cma(C) rpivid_mem uio_pdrv_genirq uio sch_fq_codel cuse fuse i2c_dev ip_tables x_tables ipv6 crc_ccitt
[   17.218149] Process sshd (pid: 1154, stack limit = 0x0000000008be0190)
[   17.220332] CPU: 3 PID: 1154 Comm: sshd Tainted: G      D  C        4.19.105-v8-28 #28
[   17.222501] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[   17.225032] pstate: 60000005 (nZCv daif -PAN -UAO)
[   17.227525] pc : __kmalloc_track_caller+0xd8/0x2e0
[   17.229966] lr : __kmalloc_track_caller+0x5c/0x2e0
[   17.232343] sp : ffffff800e33ba00
[   17.234964] x29: ffffff800e33ba00 x28: 0000000000000d83 
[   17.237306] x27: ffffffc9d3ba9d00 x26: ffffffc9f7003380 
[   17.239771] x25: ffffffc9dc72ca00 x24: ffffffc9d3ba9d00 
[   17.242170] x23: ffffff9ea8cb29e0 x22: ffffffc9f7003380 
[   17.244396] x21: 0000000000001140 x20: 00000000004112c0 
[   17.246668] x19: 0000000000040006 x18: 0000000000000000 
[   17.249007] x17: 0000000000000000 x16: 0000000000000000 
[   17.251371] x15: 0000000000000000 x14: 0001000800000004 
[   17.253686] x13: 0080180200000482 x12: 5eb7e3f400020014 
[   17.255874] x11: 0000005000000692 x10: 0000068e00015180 
[   17.258512] x9 : 0001518000060014 x8 : 000006adffffffff 
[   17.260851] x7 : ffffffff00060014 x6 : 0000007ff2b552a4 
[   17.263031] x5 : ffffffc9f59fd800 x4 : ffffffc9fbfb9fe0 
[   17.265558] x3 : 00000000000385c3 x2 : ffffff9ea9238760 
[   17.265561] x1 : 0000000000000000 x0 : 0000000000000001 
[   17.265564] Call trace:
[   17.265575]  __kmalloc_track_caller+0xd8/0x2e0
[   17.265585]  __kmalloc_reserve.isra.0+0x48/0x98
[   17.276555]  __alloc_skb+0x80/0x190
[   17.276560]  netlink_dump+0x78/0x2b8
[   17.276562]  netlink_recvmsg+0x208/0x3c8
[   17.276564]  sock_recvmsg+0x58/0x68
[   17.276567]  ___sys_recvmsg+0xcc/0x268
[   17.276569]  __sys_recvmsg+0x78/0xc8
[   17.276571]  __arm64_sys_recvmsg+0x2c/0x38
[   17.276582]  el0_svc_common+0x88/0x1a8
[   17.295154]  el0_svc_handler+0x38/0x78
[   17.295159]  el0_svc+0x8/0xc
[   17.295163] Code: f8616853 52800020 b4000ef3 b9402341 (f8616a79) 
[   17.295171] ---[ end trace 69d29e452652f02e ]---
[  221.855632] systemd-journald[150]: File /var/log/journal/27ac7db47f7547d38555f6486ce5f5e6/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[  223.469288] Unable to handle kernel paging request at virtual address 0000000000040006
[  223.469302] Mem abort info:
[  223.469307]   ESR = 0x96000005
[  223.469313]   Exception class = DABT (current EL), IL = 32 bits
[  223.469318]   SET = 0, FnV = 0
[  223.469322]   EA = 0, S1PTW = 0
[  223.469326] Data abort info:
[  223.469331]   ISV = 0, ISS = 0x00000005
[  223.469335]   CM = 0, WnR = 0
[  223.469342] user pgtable: 4k pages, 39-bit VAs, pgdp = 000000007ea3ca9c
[  223.469347] [0000000000040006] pgd=0000000000000000, pud=0000000000000000
[  223.469356] Internal error: Oops: 96000005 [#7] PREEMPT SMP
[  223.469361] Modules linked in: snd_hrtimer snd_seq snd_seq_device bridge stp llc iptable_filter overlay bluetooth ecdh_generic uinput vc4 drm_kms_helper cec snd_soc_core snd_bcm2835(C) snd_compress snd_pcm_dmaengine snd_pcm evdev brcmfmac sg brcmutil snd_timer cfg80211 v3d gpu_sched rfkill snd bcm2835_codec(C) drm v4l2_mem2mem syscopyarea bcm2835_v4l2(C) sysfillrect sysimgblt bcm2835_mmal_vchiq(C) fb_sys_fops v4l2_common videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops drm_panel_orientation_quirks videobuf2_v4l2 raspberrypi_hwmon videobuf2_common hwmon videodev media vc_sm_cma(C) rpivid_mem uio_pdrv_genirq uio sch_fq_codel cuse fuse i2c_dev ip_tables x_tables ipv6 crc_ccitt
[  223.469443] Process (t-daemon) (pid: 1841, stack limit = 0x0000000041f5d51e)
[  223.469451] CPU: 3 PID: 1841 Comm: (t-daemon) Tainted: G      D  C        4.19.105-v8-28 #28
[  223.469455] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[  223.469462] pstate: 60000005 (nZCv daif -PAN -UAO)
[  223.469479] pc : __kmalloc_track_caller+0xd8/0x2e0
[  223.469485] lr : __kmalloc_track_caller+0x5c/0x2e0
[  223.469489] sp : ffffff800f7c3bb0
[  223.469493] x29: ffffff800f7c3bb0 x28: 0000000000000d83 
[  223.469498] x27: ffffffc9d3121d00 x26: ffffffc9f7003380 
[  223.469503] x25: ffffff9ea94786c8 x24: ffffffc9d3121d00 
[  223.469507] x23: ffffff9ea8d834fc x22: ffffffc9f7003380 
[  223.469511] x21: 00000000000016c0 x20: 00000000006000c0 
[  223.469515] x19: 0000000000040006 x18: ffffffc9fbfb3e08 
[  223.469519] x17: 0000000000000000 x16: 0000000000000000 
[  223.469524] x15: ffffffc9bfbd84d8 x14: 0000000000000000 
[  223.469528] x13: 0000000000000000 x12: 0000000000000000 
[  223.469532] x11: 0000000000000000 x10: 00000000ffffffff 
[  223.469536] x9 : 0000000000000000 x8 : ffffffc9bfa80900 
[  223.469540] x7 : 0000000000000000 x6 : ffffff9ea9478000 
[  223.469544] x5 : 0000000000000000 x4 : ffffff9ea851eb4c 
[  223.469548] x3 : ffffff9ea953d000 x2 : ffffff9ea9238760 
[  223.469553] x1 : 0000000000000000 x0 : 0000000000000001 
[  223.469558] Call trace:
[  223.469564]  __kmalloc_track_caller+0xd8/0x2e0
[  223.469571]  kmemdup+0x38/0x60
[  223.469580]  ipv4_sysctl_init_net+0x44/0x150
[  223.469588]  ops_init+0x80/0x128
[  223.469592]  setup_net+0xd0/0x1c8
[  223.469596]  copy_net_ns+0xc8/0x1b8
[  223.469603]  create_new_namespaces+0x108/0x1f0
[  223.469609]  unshare_nsproxy_namespaces+0x68/0xa8
[  223.469618]  ksys_unshare+0x178/0x2d8
[  223.469622]  __arm64_sys_unshare+0x20/0x30
[  223.469628]  el0_svc_common+0x88/0x1a8
[  223.469632]  el0_svc_handler+0x38/0x78
[  223.469638]  el0_svc+0x8/0xc
[  223.469643] Code: f8616853 52800020 b4000ef3 b9402341 (f8616a79) 
[  223.469651] ---[ end trace 69d29e452652f02f ]---

As a side question (which is likely related to this issue), any idea on when you'll be able to crank things up to the latest 5.4.y kernel branch or if you're even planning to do it? If you're not interested, can you give any tips on how to try the latest official 5.4.y kernel headers on the 64bit Ubuntu Server? Thanks so much for your hard work, hope you are doing alright & all the best.

t206 commented 4 years ago

I'm also experiencing this failure to boot at board power up on the Raspberry Pi 4B. It seems like it started a couple of weeks ago and doesn't affect rebooting an already powered-on Pi - it only fails at power cycling. On average, I have to power cycle 3-4 times but once yesterday I power cycled 7 times before it came up. Thinking that this might be an issue with 18.04 (kernel 4.19.105 v8), I upgraded the Pi to Ubuntu 19.10 which I'm running quite successfully now. Unfortunately, this didn't resolve the power-cycle lockups. They're no worse under 19.10 but no better. Finally, I also tried upgrading 19.10 to 20.04 using the 4.19.105 v8 kernel but was not successful. Numerous packages failed to install so I finally gave up on this path.

I then started over with the newest Unbuntu 20.04 server distro and added the Mate desktop. That did indeed come up with the 5.4.x kernel but overall, the Pi4B is unstable. It'll run fine for an hour or so and then both LEDs go out and the system becomes unresponsive and requires a power cycle to start up again. What did get fixed however is the power-cycle lock ups - those are completely gone using Ubuntu server 20.04/5.4.x kernel. The boot/firmware directory under 20.04 is different than what we have under 18.04 and 19.10.

I don't see anything on my monitor when the kernel locks up in 18.04/19.10 but I'd be happy to troubleshoot the power-cycle issue if someone knows how to get the monitor to display startup messages.

TheRemote commented 2 years ago

This should be resolved in the latest Raspberry Pi firmware