MythTV / mythtv

The official MythTV repository
https://www.mythtv.org
GNU General Public License v2.0
705 stars 345 forks source link

System freezes occasionally when unattended #741

Closed Jpilk closed 1 year ago

Jpilk commented 1 year ago

What steps will reproduce the bug?

Run MythTV as normal. It may freeze the system in a few days time.

How often does it reproduce? Is there a required condition?

What is the expected behaviour?

A system that doesn't freeze.

What do you see instead?

Additional information

The freezes started after a system update by a bundle of kde-related packages installed in late February. The most recent freeze was this morning. I have posted on kde-fedora and mythtv-users. All disks are reported clean and DB maintenence has been done regularly. I have experimented with journalctl command lines and attach a result showing events around the reboots. It looks to me as if rtkit-daemon is involved, but the version I have dates from Feb 2022. [journalctl_Boo_can_lin_acr2.txt](https://github.com/MythTV/mythtv/files/11082950/journalctl_Boo_can_lin_acr2.txt)
Jpilk commented 1 year ago

That was Master in Fedora 36. Another box running fixes/32 in SL7.9 has not had the problem; but today I found it in a similar unresponsive state: no mouse, no CapsLock light, pingable but no ssh, disk light on.

That box had been running the kernel-lt 5.4 series from elrepo until last week, when an update failed to recognise the intel-Q33 chipset and I followed the suggestion to run the kernel-ml, currently 6.2.8 instead. The canary has moved in.

In Fedora 36 the log above shows no canary until 6.1.13 on 27 Feb.

Jpilk commented 1 year ago

The Fedora system froze again this morning.

Mar 29 02:06:22 HPFed rtkit-daemon[889]: The canary thread is apparently starving. Taking action.

It seems to happen usually around that time. This post is typical with MythTV and Firefox started from command lines.

The daemon seems to keep creating new threads for mythfrontend. Perhaps they are overloading something, or filling memory?

 Generate and filter the log:

[john@HPFed ~]$ sudo SYSTEMD_COLORS=false journalctl --since -1d | grep -v tda1004x | grep -E "(-- Boo|rtkit-dae|nux ver)" [sudo] password for john:

Mar 28 13:44:51 HPFed rtkit-daemon[887]: Successfully made thread 15134 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:44:51 HPFed rtkit-daemon[887]: Successfully made thread 15144 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:45:03 HPFed rtkit-daemon[887]: Successfully made thread 15187 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:45:03 HPFed rtkit-daemon[887]: Successfully made thread 15195 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:45:19 HPFed rtkit-daemon[887]: Successfully made thread 15235 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:45:19 HPFed rtkit-daemon[887]: Successfully made thread 15243 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:45:42 HPFed rtkit-daemon[887]: Successfully made thread 15298 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:45:42 HPFed rtkit-daemon[887]: Successfully made thread 15306 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:51:43 HPFed rtkit-daemon[887]: Successfully made thread 16044 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:51:43 HPFed rtkit-daemon[887]: Successfully made thread 16054 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:58:01 HPFed rtkit-daemon[887]: Successfully made thread 16706 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:58:01 HPFed rtkit-daemon[887]: Successfully made thread 16716 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:58:14 HPFed rtkit-daemon[887]: Successfully made thread 16726 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:58:14 HPFed rtkit-daemon[887]: Successfully made thread 16734 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:58:25 HPFed rtkit-daemon[887]: Successfully made thread 16749 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 13:58:25 HPFed rtkit-daemon[887]: Successfully made thread 16757 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 14:00:28 HPFed rtkit-daemon[887]: Successfully made thread 16825 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 14:00:28 HPFed rtkit-daemon[887]: Successfully made thread 16833 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 14:06:02 HPFed rtkit-daemon[887]: Successfully made thread 17556 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 14:06:02 HPFed rtkit-daemon[887]: Successfully made thread 17566 of process 2827 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 14:34:35 HPFed audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Mar 28 14:34:34 HPFed systemd[1]: Stopping rtkit-daemon.service - RealtimeKit Scheduling Policy Service... Mar 28 14:34:35 HPFed systemd[1]: rtkit-daemon.service: Deactivated successfully. Mar 28 14:34:35 HPFed systemd[1]: Stopped rtkit-daemon.service - RealtimeKit Scheduling Policy Service. Mar 28 14:34:35 HPFed systemd[1]: rtkit-daemon.service: Consumed 1.151s CPU time.

This was a manual shutdown and reboot

-- Boot 51daaf664fae4c88921ad588dc02f84e -- Mar 28 14:35:31 HPFed kernel: Linux version 6.2.8-100.fc36.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023 Mar 28 14:36:06 HPFed systemd[1]: Starting rtkit-daemon.service - RealtimeKit Scheduling Policy Service... Mar 28 14:36:20 HPFed rtkit-daemon[889]: Successfully called chroot. Mar 28 14:36:20 HPFed rtkit-daemon[889]: Successfully dropped privileges. Mar 28 14:36:20 HPFed rtkit-daemon[889]: Successfully limited resources. Mar 28 14:36:20 HPFed rtkit-daemon[889]: Running. Mar 28 14:36:20 HPFed rtkit-daemon[889]: Watchdog thread running. Mar 28 14:36:20 HPFed rtkit-daemon[889]: Canary thread running. Mar 28 14:36:20 HPFed systemd[1]: Started rtkit-daemon.service - RealtimeKit Scheduling Policy Service. Mar 28 14:36:20 HPFed audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Mar 28 14:38:52 HPFed rtkit-daemon[889]: Successfully made thread 1843 of process 1843 (/usr/bin/pipewire) owned by '1000' high priority at nice level -11. Mar 28 14:38:53 HPFed rtkit-daemon[889]: Successfully made thread 1851 of process 1843 (/usr/bin/pipewire) owned by '1000' RT at priority 20. Mar 28 14:38:53 HPFed rtkit-daemon[889]: Successfully made thread 1844 of process 1844 (/usr/bin/wireplumber) owned by '1000' high priority at nice level -11. Mar 28 14:38:53 HPFed rtkit-daemon[889]: Successfully made thread 1866 of process 1844 (/usr/bin/wireplumber) owned by '1000' RT at priority 20. Mar 28 14:39:17 HPFed rtkit-daemon[889]: Successfully made thread 2014 of process 2014 (/usr/bin/pipewire-pulse) owned by '1000' high priority at nice level -11. Mar 28 14:39:17 HPFed rtkit-daemon[889]: Successfully made thread 2017 of process 2014 (/usr/bin/pipewire-pulse) owned by '1000' RT at priority 20. Mar 28 14:42:34 HPFed rtkit-daemon[889]: Successfully made thread 2651 of process 2550 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 14:42:35 HPFed rtkit-daemon[889]: Successfully made thread 2663 of process 2550 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 16:53:14 HPFed rtkit-daemon[889]: Successfully made thread 3281 of process 2550 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 16:53:21 HPFed rtkit-daemon[889]: Successfully made thread 3305 of process 2550 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 28 21:30:23 HPFed rtkit-daemon[889]: Successfully made thread 4259 of process 4087 (/usr/lib64/firefox/firefox) owned by '1000' RT at priority 10. Mar 29 02:06:22 HPFed rtkit-daemon[889]: The canary thread is apparently starving. Taking action.

 And this is a reboot after freeze.
 Here's the command line:

Mar 29 12:58:00 HPFed kernel: Command line: BOOT_IMAGE=(hd0,msdos1)/vmlinuz-6.2.8-100.fc36.x86_64 root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/swap rd.lvm.lv=fedora/root rhgb fsck-mode=force

-- Boot 8760a04c5ebd4d49b12817e9ee304eb0 -- Mar 29 12:58:00 HPFed kernel: Linux version 6.2.8-100.fc36.x86_64 (mockbuild@bkernel01.iad2.fedoraproject.org) (gcc (GCC) 12.2.1 20221121 (Red Hat 12.2.1-4), GNU ld version 2.37-37.fc36) #1 SMP PREEMPT_DYNAMIC Wed Mar 22 19:14:19 UTC 2023 Mar 29 12:59:09 HPFed systemd[1]: Starting rtkit-daemon.service - RealtimeKit Scheduling Policy Service... Mar 29 12:59:21 HPFed rtkit-daemon[904]: Successfully called chroot. Mar 29 12:59:21 HPFed rtkit-daemon[904]: Successfully dropped privileges. Mar 29 12:59:21 HPFed rtkit-daemon[904]: Successfully limited resources. Mar 29 12:59:21 HPFed rtkit-daemon[904]: Running. Mar 29 12:59:21 HPFed rtkit-daemon[904]: Watchdog thread running. Mar 29 12:59:21 HPFed rtkit-daemon[904]: Canary thread running. Mar 29 12:59:21 HPFed audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=rtkit-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Mar 29 12:59:21 HPFed systemd[1]: Started rtkit-daemon.service - RealtimeKit Scheduling Policy Service. Mar 29 13:02:04 HPFed rtkit-daemon[904]: Successfully made thread 1873 of process 1873 (/usr/bin/pipewire) owned by '1000' high priority at nice level -11. Mar 29 13:02:04 HPFed rtkit-daemon[904]: Successfully made thread 1874 of process 1874 (/usr/bin/wireplumber) owned by '1000' high priority at nice level -11. Mar 29 13:02:04 HPFed rtkit-daemon[904]: Successfully made thread 1881 of process 1874 (/usr/bin/wireplumber) owned by '1000' RT at priority 20. Mar 29 13:02:04 HPFed rtkit-daemon[904]: Successfully made thread 1885 of process 1873 (/usr/bin/pipewire) owned by '1000' RT at priority 20. Mar 29 13:02:24 HPFed rtkit-daemon[904]: Successfully made thread 2041 of process 2041 (/usr/bin/pipewire-pulse) owned by '1000' high priority at nice level -11. Mar 29 13:02:25 HPFed rtkit-daemon[904]: Successfully made thread 2045 of process 2041 (/usr/bin/pipewire-pulse) owned by '1000' RT at priority 20. Mar 29 13:14:49 HPFed rtkit-daemon[904]: Successfully made thread 3243 of process 3040 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 29 13:14:50 HPFed rtkit-daemon[904]: Successfully made thread 3255 of process 3040 (/usr/bin/mythfrontend) owned by '1000' RT at priority 20. Mar 29 13:21:56 HPFed rtkit-daemon[904]: Successfully made thread 3640 of process 3457 (/usr/lib64/firefox/firefox) owned by '1000' RT at priority 10. [john@HPFed ~]$

Jpilk commented 1 year ago

???

/etc/anacrontab the jobs will be started during the following hours only START_HOURS_RANGE=3-22

man anacron Anacron uses up to two file descriptors for each active job. It may run out of descriptors if there are lots of ac‐ tive jobs. See echo $(($(ulimit -n) / 2)) for information how many concurent jobs anacron may run.

[john@HPFed ~]$ echo $(($(ulimit -n) / 2)) 512

It appears that the boldface comments earlier are triggered by the hashes I intended as commenting-out...

Jpilk commented 1 year ago

Two threads are created by the frontend every time playback starts. Thread identifiers increase at that, and increase by more if playback is stopped and restarted

Jpilk commented 1 year ago

This happened again just after 0201 BST last night and stopped a recording in progress. Later this morning around 180 kde-related F36 packages were installed or updated. It's difficult to know whether progress is being made, but otherwise it's working well.

Jpilk commented 1 year ago

It's too early to say that this has been fixed, but I suspect that the original problem was not with the rtkit-daemon.

The timing seemed linked to anacron, and through that to /etc/cron.daily, which pointed to google-chrome. A bugzilla reference dealing with suspend/resume suggested that xdg-desktop-portal-gnome might be involved (in F38), and I had xdg-desktop-portal-gtk installed, alongside the -kde and base versions. I removed the -gtk package with no apparent bad consequences a few days ago and now have an update of the nvidia-470xx driver. Still in progress...

Jpilk commented 1 year ago

The last canary event and freeze was 9 days ago. I think this can be closed.

[john@HPFed ~]$ rpm -qa | grep xdg-desktop xdg-desktop-portal-1.12.6-1.fc36.x86_64 xdg-desktop-portal-kde-5.27.3-1.fc36.x86_64 [john@HPFed ~]$