fHDHR / fHDHR_plugin_origin_ceton

Do What The F*ck You Want To Public License
1 stars 4 forks source link

watchdog, soft lockup #41

Closed arrmo closed 2 years ago

arrmo commented 2 years ago

OK, not saying this is due to the plugin or Ceton, but as you had mentioned this before ... I have seen this a couple times in the last week or two => today the worst though, at the time all 4 tuners were running 😆. Basically, machine hung, with the message (pulled from the log after reboot),

Apr  6 19:27:46 linuxServer kernel: [43543.484697] watchdog: BUG: soft lockup - CPU#1 stuck for 27s! [ffmpeg:1228405]
Apr  6 19:28:14 linuxServer kernel: [43571.484646] watchdog: BUG: soft lockup - CPU#1 stuck for 53s! [ffmpeg:1228405]
Apr  6 19:28:46 linuxServer kernel: [43603.484587] watchdog: BUG: soft lockup - CPU#1 stuck for 82s! [ffmpeg:1228405]
Apr  6 19:29:14 linuxServer kernel: [43631.484535] watchdog: BUG: soft lockup - CPU#1 stuck for 108s! [ffmpeg:1228405]
Apr  6 19:29:42 linuxServer kernel: [43659.484485] watchdog: BUG: soft lockup - CPU#1 stuck for 135s! [ffmpeg:1228405]
Apr  6 19:30:10 linuxServer kernel: [43687.484434] watchdog: BUG: soft lockup - CPU#1 stuck for 161s! [ffmpeg:1228405]
Apr  6 19:30:38 linuxServer kernel: [43715.484381] watchdog: BUG: soft lockup - CPU#1 stuck for 187s! [ffmpeg:1228405]
Apr  6 19:31:06 linuxServer kernel: [43743.484329] watchdog: BUG: soft lockup - CPU#1 stuck for 213s! [ffmpeg:1228405]
Apr  6 19:31:46 linuxServer kernel: [43783.484255] watchdog: BUG: soft lockup - CPU#1 stuck for 250s! [ffmpeg:1228405]
Apr  6 19:32:14 linuxServer kernel: [43811.484203] watchdog: BUG: soft lockup - CPU#1 stuck for 276s! [ffmpeg:1228405]
Apr  6 19:32:42 linuxServer kernel: [43839.484152] watchdog: BUG: soft lockup - CPU#1 stuck for 302s! [ffmpeg:1228405]
Apr  6 19:33:10 linuxServer kernel: [43867.484100] watchdog: BUG: soft lockup - CPU#1 stuck for 328s! [ffmpeg:1228405]
Apr  6 19:33:38 linuxServer kernel: [43895.484048] watchdog: BUG: soft lockup - CPU#1 stuck for 354s! [ffmpeg:1228405]
Apr  6 19:34:06 linuxServer kernel: [43923.483997] watchdog: BUG: soft lockup - CPU#1 stuck for 380s! [ffmpeg:1228405]
Apr  6 19:34:46 linuxServer kernel: [43963.483923] watchdog: BUG: soft lockup - CPU#1 stuck for 418s! [ffmpeg:1228405]
Apr  6 19:35:14 linuxServer kernel: [43991.483872] watchdog: BUG: soft lockup - CPU#1 stuck for 444s! [ffmpeg:1228405]
Apr  6 19:35:42 linuxServer kernel: [44019.483820] watchdog: BUG: soft lockup - CPU#1 stuck for 470s! [ffmpeg:1228405]
Apr  6 19:36:10 linuxServer kernel: [44047.483765] watchdog: BUG: soft lockup - CPU#1 stuck for 496s! [ffmpeg:1228405]
Apr  6 19:36:38 linuxServer kernel: [44075.483708] watchdog: BUG: soft lockup - CPU#1 stuck for 522s! [ffmpeg:1228405]
Apr  6 19:37:06 linuxServer kernel: [44103.483650] watchdog: BUG: soft lockup - CPU#1 stuck for 548s! [ffmpeg:1228405]
Apr  6 19:37:46 linuxServer kernel: [44143.483568] watchdog: BUG: soft lockup - CPU#1 stuck for 585s! [ffmpeg:1228405]
Apr  6 19:38:14 linuxServer kernel: [44171.483510] watchdog: BUG: soft lockup - CPU#1 stuck for 611s! [ffmpeg:1228405]

But is this really Ceton related, or rather ffmpeg? Not sure how to debug further, but I'm open to try.

Thanks!

DanAustinGH commented 2 years ago

I have not noticed that since the first time that fully locked up my system. If it is happening, I suspect the newer kernel is handling it better. Will check my logs. I will say that overall my system is running better with the PCIe. The ethernet device should not be overloading my home network, but it does tend to 'glitch' way more.

Part of me thinks that using cat to feed ffmpeg might be too aggressive reading from the device handles. I might play with my idea of sleeping for between .1 and .8 seconds between device reads to allow other threads to run...

arrmo commented 2 years ago

FYI, this has not happened since. Not sure if it was just a transient thing or not - for a lot of reasons, I had to jump to Ubuntu beta ... so may just be that. Watching though! 😆

DanAustinGH commented 2 years ago

I have most of the bits to reproduce it I think. I am not sure if it will effect Plex, but Emby does show this. Tune a channel on a Roku, and instead of backing out, turn the Roku (or Roku integrated TV) off. This will likely orphan a tuner that is still streaming to nowhere. Use the FHDHR UI to close the Ceton tuner and you now have a softlockup. Once in this that the problem ffmpeg process cannot be stopped and a reboot is the only way out....

arrmo commented 2 years ago

Good finding! Seems like an fHDHR / ffmpeg thing though, not the kernel driver for the PCIe tuner ... agreed?

Thanks!

DanAustinGH commented 2 years ago

I would guess the kernel driver could/should be able to avoid a soft lockup, but that is not something I would care to tackle.
I really doubt the issue is ffmpeg (but maybe how we call it). I don't use the Roku TV, but a guest does, so the next time I catch an orphaned tuner, I will try restarting Emby, which used to free them on the ethernet Ceton.

arrmo commented 2 years ago

I have a Roku TV, can try this here. So the issue is that ffmpeg is not halted (exited), right? I admit, not sure how this soft locks ... why does it matter if it's streaming to never never land or not? Sorry, just trying to understand.

Thanks!

DanAustinGH commented 2 years ago

Emby has a history of having orphaned tuners when the client was a Roku and the person watching did not 'properly' stop the stream. The devs have been working on fixing this, but it is not 100% yet.

The channel that was on the orphaned session was not one I watch, but I could see that a guest had tuned it a couple days ago. I first tried stopping it from the Tuner tab. It appeared to close but came back seconds later, I then tried to close it from the Ceton tab and it hung.

Not conclusive, but in the past I have seen an orphaned tuner be respawned by Emby, but interestingly Emby does not know it is in use according to the admin interface. This happened enough that I discovered I could restart Emby and it would properly clear the orphan. I see my guest is watching a game on the Roku TV tonight. I'll check in the AM to see if I have an orphan and if resetting Emby will clear it. I'll also be going over our close code to see why it has an issue when it comes from out tab, but not when from the tuners tab...

DanAustinGH commented 2 years ago

And I misunderstood part of your question. I don't think it matters if the stream is going somewhere or not. I have not tried stopping actively used tuners, but I do on occasion need to shutdown the orphaned ones...

arrmo commented 2 years ago

No worries! What confuses me ... why is this a soft lockup then? Terminated or not => why all of a sudden is the CPU hung? That's what confuses me 😆

Thanks!

arrmo commented 2 years ago

Emby has a history of having orphaned tuners when the client was a Roku and the person watching did not 'properly' stop the stream. The devs have been working on fixing this, but it is not 100% yet.

This may also explain an odd item I'm seeing (trying to debug). Once in a while, I see from the (Ceton) API that a tuner is noted as "External", not matching to the expected channel (from fHDHR). Let me see if this is the trigger as well.

DanAustinGH commented 2 years ago

So the orphan tuner is how I usually find my self working on this. I have not had a chance to test it, but I suspect that trying to kill a ffmpeg process that does have an attached client would result in the same. The real issue is between ffmpeg and the device driver/device handle. I'll make time to test failure modes when I do not have a guest using it and no recordings are scheduled.

I should also note that the ffmpeg process is not killed and is unkillable after trying to stop it in the UI. A reboot sometimes works (limited tests so far), but often it needs a hard reset of the system power...

arrmo commented 2 years ago

OK, tried this here - Roku streaming through / from Plex (TV, so through the PCIe Ceton). Just powered off the Roku ... and yes, for a period of time (perhaps 5 min, didn't watch it too closely 😉) ffmpeg is running, and the tuner is still occupied. But it releases gracefully, and the tuner becomes available again => also, ffmpeg exits.

You see the same there?

Thanks!

arrmo commented 2 years ago

BTW, with VLC, if I just exit - the tuner is immediately released ... so perhaps this is just Plex trying to reconnect for a period of time, then gives up? Issue seems to be Emby related the, no?

DanAustinGH commented 2 years ago

It could be. I've had issues with orphaned tuners before I switched to the PCIe, so that is not new. I saw a couple article suggesting we can pass 'Q' to the python process controlling ffmpeg, which should cause ffmpeg to exit instead of killing it, but that'll take some research to validate. I think that stopping tuners in fhdhr is likely rare, but want to document that I have seen issue arise with this plugin.

DanAustinGH commented 2 years ago

Any reason to not close this after merging the locking changes?

arrmo commented 2 years ago

Nope, agreed. Thanks!