MythTV / mythtv

The official MythTV repository
https://www.mythtv.org
GNU General Public License v2.0
704 stars 345 forks source link

Random segfault Nvidia Shield Tube (32-bit) during playback #368

Closed cbovy closed 1 year ago

cbovy commented 3 years ago

What steps will reproduce the bug?

It is hard to reproduce. It happens randomly while watching recording or watching LiveTV.

How often does it reproduce? Is there a required condition?

Not known at the moment.

What is the expected behaviour?

The expected behaviour is to continue showing the recording.

What do you see instead?

A full crash of the MythFrontend app on the Shield.

Additional information

The Myth Backend version is: 2:31.0+fixes.202106102123.0680b37c68~ubuntu20.10.1

Full backtrace at: https://pastebin.com/NRaYswn9

bennettpeter commented 3 years ago

I can see is that it is failing in socket code. Are you doing anything unusual network-wise? Are you using a network remote to control playback? Is it possible there is a network failure somewhere?

This is in code that is not specific to Android, so it is possible that the same failure could occur in a Linux frontend.

It looks like you are using the 32-bit build. The original shield is able to run the 64-bit build. They should both work, but you could try the 64-bit build to see if that helps.

cbovy commented 3 years ago

I'm running MythTv 32-bit because it is the 2019 'tube' variant, which is 32 bit I believe.

The Shield is connected via Ethernet through a switch to the backend (1 Gbps, full duplex) No, no network remote control. Other apps (like NetFlix) are not having issues.

I'm only having Nvidia Shields as frontends. I could try a Frontend on my Ubuntu laptop and see if I can reproduce.

Is there any way to debug the socket communication on the frontend and backend side, and compare? Any ideas to prevent a segfault? Happy to try something out. Build environment is ready to rebuild.

ddp526 commented 3 years ago

Hi, I know 'me too' posts are not welcome, but I did want to add that I have very similar issues, but with a Sony android TV (32-bit build only compatible build). I have random infrequent crashes, and it has been happening for a long time for me (year+, over every build I have tried), but I have only recently got motivated enough to see what might be going on. I think the pertinent log lines (via adb logcat), from 2 separate errors recently:

09-05 22:24:52.261 27431 27524 E mfe     : signalhandling.cpp:291:handleSignal  Received Segmentation fault: Code 1, PID 732264568, UID 0, Value 0x00000000
09-01 23:47:24.489  9585  9618 E mfe     : signalhandling.cpp:291:handleSignal  Received Segmentation fault: Code 1, PID 1072179320, UID 0, Value 0x00000000

Sorry, I don't have full traces, etc. and I can't be quite sure this seg fault is the same code path? Fwiw: mythfrontend - mythfrontend-20210522-arm-v32-Pre-2763-g034eb86a3f (old I know, can try an update if you think it would help) mythbackend - 2:32.0~master.202109032033.8899ca5fd6~ubuntu20.04.1 connected over ethernet, and no remote-control via the network.

the crashes are happening mid-playback, no 'interaction' via remote, or otherwise when it crashes. The logs on the frontend show a checkWifi state (it is disabled for me) as the last item before each crash, but I'm not convinced this is related as these are in my logs every 5 seconds, so I think it is just the last item that dropped a log line out.

HTH

cbovy commented 2 years ago

This issue is still teasing me. I've done some further investigation. I've also added a recent backtrace. I'm using the 32-bit version of MythFrontend, as the Nvidia Shield 2019 Tube is only supporting 32-bit applications. The segfault happens with the version from @bennettpeter but also with my own compiled version. Backtrace attached is made with my own compiled version.

Shield running on: 9.0.0 but also on 9.0.1 (issue was also present before) Version Shield: mythfrontend-20220220-arm-v32-Pre-3554-gb2a21798d6 Version Backend: 2:32.0+fixes.202202180054.0d9d21abaa~ubuntu21.10.1 The Shield is normally connected over Ethernet, on 1000Mbps link.

The segfault is only happening when the Shield is connected over the Ethernet link. When connected via Wireless, no segfault happened so far. (The Ethernet connection (including same cables) have been tested with another frontend (on notebook, running 2:32.0+fixes.202202212348.dfc8d074d8~ubuntu21.10.1) and no errors occur. I think I can rule out the cabling.)

Some questions I have open:

Full backtrace: gdb.txt

ddp526 commented 2 years ago

will try - may take some time, the only shield TV that can do wireless is used infrequently. But, it crashed last night, which prompted me to switch it to wireless. If it last a week or two like this, then it may well confirm your theory. BTW, I get these crashes on my Shield (Tube version), but also a Sony TV as well.

ddp526 commented 2 years ago

My crashes continued (have had 2 since the last post) on wireless, so seems there is either 2 issues, or its not the wired network code.

HTH

cbovy commented 2 years ago

Thanks @ddp526 I can also confirm that I have crashed using wireless only, although it is less. Then the 32-bit in combination with Qt can be the issue. Any idea? I'm open for any suggestion to test.

Regards, Charles

cbovy commented 2 years ago

I've recompiled with Qt 5.15.3 but same crashes are happening, unfortunately.

cbovy commented 2 years ago

I've been doing test with Shield Tube (32-bit) and Shield Pro (64-bit). The Shield Pro is running perfectly fine, without any issues. The Shield Tube is having the crashes. I'll try to compile with latest Sdk and NDK, and see if that makes a difference. Latest Qt (5.15.3) is not fixing the issue.

bennettpeter commented 2 years ago

Look at your backtrace for something like this near the start:

Thread 21 "MythSocketThrea" received signal SIGSEGV, Segmentation fault.

Search the trace for the string "Thread 21". (substutute the actual number if it is not 21).

Thread 21 (Thread 25542.25682):

0 0x91f9e374 in MythSocket::qt_static_metacall (_o=0xa5855850, _c=QMetaObject::InvokeMetaMethod, _id=12, _a=0xa6f37a98) at moc/moc_mythsocket.cpp:159

    _t = 0xa5855850

Search your android build directories where you built that same version that crashed. Look for moc_mythsocket.cpp. It should be in android/build/mythtv/libs/libmythbase/moc (for 32bit) See what line 159 of moc_mythsocket.cpp has (substitute the actual number from above) On one build of mine it is case 12: _t->ReadReal and on another it is case 13: _t->ResetReal This will tell us what function it is trying to call when it fails. We can then add some logging to try and determine why it fails at that point.

cbovy commented 2 years ago

Thanks @bennettpeter for looking into this. It is indeed _id=12, which is calling the following functions according to moc_mythsocket.cpp:

_case 12: _t->ReadReal((*reinterpret_cast< char*(*)>(_a[1])),(*reinterpret_cast< int(*)>(_a[2])),(*reinterpret_cast< std::chrono::milliseconds(*)>(_a[3])),(*reinterpret_cast< int*(*)>(_a[4]))); break;_

Happy to recompile and test again.

bennettpeter commented 2 years ago

It is using Qt slots to call MythSocket::ReadReal, however it is failing in the caller before it can actually do the call. The parameters all seem to have values, but something must be corrupted. Possibly the MythSocket has been destroyed between QMetaObject::invokeMethod in MythSocket::Read and _t->ReadReal in MythSocket::qt_static_metacall.

I suggest turning on logging for sockets and network, to see if the socket is destroyed just before the crash, or to see if anything else useful can be found from the log.

To turn on logging: Start mythfrontend on the Shield. Go into frontend settings and turn on network remote control (Setup > General > Remote Control > Enable Network Remote Control.).

Shutdown and restart mythfrontend

On Linux use telnet: telnet 6546 set verbose socket,network exit

After the crash, access the log using this from linux adb logcat |& tee tmp/android.log

This will show the last few minutes of log. You need to get to it soon, because it only stores a few minutes worth of log. Alternatively you can start capturing the log before the crash and keep capturing it until after the crash, but that may slow things down or give you a lot of unnecessary data, especially of you cannot tell when the crash will happen.

cbovy commented 2 years ago

I've run the session. Please find the details below and attached.

Thread 22 "MythSocketThrea" received signal SIGSEGV, Segmentation fault. [Switching to Thread 19916.20062] 0x91a3b374 in MythSocket::qt_static_metacall (_o=0x95ce5cf0, _c=QMetaObject::InvokeMetaMethod, _id=12, _a=0xa607da98) at moc/moc_mythsocket.cpp:159 159 case 12: _t->ReadReal((reinterpret_cast< char()>(_a[1])),(reinterpret_cast< int()>(_a[2])),(reinterpret_cast< std::chrono::milliseconds()>(_a[3])),(reinterpret_cast< int()>(_a[4]))); break; (gdb) print _o $1 = (QObject *) 0x95ce5cf0 (gdb) print _c $2 = QMetaObject::InvokeMetaMethod (gdb) print _id $3 = 12 (gdb) print _a $4 = (void *) 0xa607da98 (gdb) print _a[1] $5 = (void ) 0x2607e0d4 (gdb) print _a[2] $6 = (void ) 0xa607e0d0 (gdb) print _a[3] $7 = (void ) 0xa607e0d8 (gdb) print _a[4] $8 = (void *) 0xa607e0a8

android4.log

Please let me know if additional logs are required (I did a snippet of the logs, but more available).

cbovy commented 2 years ago

Not sure if it helps, but _a[1] seems inaccessible during segfault. During regular playback, I can do the same, and then_a[1] gives output.

Thread 65 "MythSocketThrea" received signal SIGSEGV, Segmentation fault. [Switching to Thread 30400.31606] 0x91a05374 in MythSocket::qt_static_metacall (_o=0x956eea50, _c=QMetaObject::InvokeMetaMethod, _id=12, _a=0xa5bbca98) at moc/moc_mythsocket.cpp:159 159 case 12: _t->ReadReal((reinterpret_cast< char()>(_a[1])),(reinterpret_cast< int()>(_a[2])),(reinterpret_cast< std::chrono::milliseconds()>(_a[3])),(reinterpret_cast< int()>(_a[4]))); break; (gdb) x _a[1 0x25bbd0d4: Cannot access memory at address 0x25bbd0d4 (gdb) x _a[2] 0xa5bbd0d0: 0 '\000' (gdb) x _a[3] 0xa5bbd0d8: 30 '\036' (gdb) x _a[4] 0xa5bbd0a8: -52 '\314' (gdb) x _a 0xa5bbca98: 0 '\000' (gdb) p _a $1 = (void **) 0xa5bbca98

bennettpeter commented 2 years ago

The log you supplied seems to be from Live TV. It will be easier to debug if this happens on a recording playback. Does it happen on recordings, or only on live TV? If it happens on recordings please get a log from a failure while playing back a recording that is complete (i.e. one that is finished recording before you watch). Or does it only happen when playing a recording that is still in progress?

cbovy commented 2 years ago

Apologies, I'll make a backtrace and log from a finished recording, and update the ticket. The issue occurs on both Recordings and Live TV, but won´t happen on Videos.

cbovy commented 2 years ago

I rerun the crash, but now it crashes in on ReadStringListReal, but still in the Socket code. Log below as well.

Thread 23 "MythSocketThrea" received signal SIGSEGV, Segmentation fault. [Switching to Thread 23549.23944] 0x91a45294 in MythSocket::qt_static_metacall (_o=0xa7e11f50, _c=QMetaObject::InvokeMetaMethod, _id=7, _a=0xacc12aa0) at moc/moc_mythsocket.cpp:154 154 case 7: _t->ReadStringListReal((reinterpret_cast< QStringList()>(_a[1])),(reinterpret_cast< std::chrono::milliseconds()>(_a[2])),(reinterpret_cast< bool()>(_a[3]))); break; (gdb) x _a[1] 0x2cc130c8: Cannot access memory at address 0x2cc130c8 (gdb) x _a[2] 0xacc130d8: 0x00001b58 (gdb) x _a[3] 0xacc130b4: 0xacc130d7

android4.log

bennettpeter commented 2 years ago

Looking at your debug output, it looks like the first digit of _a[1] has been overwritten. It happens in both of the cases, ReadReal and ReadStringListReal. In one case _a[1] begins with 0x2cc1 and all the other addresses begin with 0xacc1. The others cases have the same problem. It seems the first digit of the address in _a[1] is being changed from 0xa to 0x2 in each case.

This is happening between the call to QMetaObject::invokeMethod in mythsocket.cpp and the call from Qt to MythSocket::qt_static_metacall in moc_mythsocket.cpp. It seems to be something going wrong in Qt slot processing.