MythTV / mythtv

The official MythTV repository
https://www.mythtv.org
GNU General Public License v2.0
705 stars 345 forks source link

mythbackend aborts when EIT scan enabled #589

Closed eyaleb closed 2 years ago

eyaleb commented 2 years ago

Fedora 36 Linux e7.eyal.emu.id.au 5.18.5-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jun 16 14:51:11 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

However, this problem started on f34 after this mythtv update:

2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-backend-32.0-1.30.20220510git26079f815a.fc34.x86_64 2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-base-themes-32.0-1.30.20220510git26079f815a.fc34.x86_64 2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-common-32.0-1.30.20220510git26079f815a.fc34.x86_64 2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-docs-32.0-1.30.20220510git26079f815a.fc34.noarch 2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-frontend-32.0-1.30.20220510git26079f815a.fc34.x86_64 2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-libs-32.0-1.30.20220510git26079f815a.fc34.x86_64 2022-05-29T08:46:33+1000 DEBUG Upgraded: mythtv-setup-32.0-1.30.20220510git26079f815a.fc34.x86_64

MythTV Version : v32.0-v32.0-36-g7077a824d2 MythTV Branch : fixes/32 Network Protocol : 91 Library API : 32.20200101-1 QT Version : 5.15.3 Options compiled in: linux debug use_hidesyms using_alsa using_jack using_oss using_pulse using_pulseoutput using_backend using_bindings_perl using_bindings_python using_bindings_php using_dvb using_firewire using_frontend using_hdhomerun using_satip using_vbox using_ceton using_joystick_menu using_libcec using_libcrypto using_libdns_sd using_libfftw3 using_libxml2 using_lirc using_mheg using_opengl using_egl using_qtwebkit using_qtscript using_qtdbus using_taglib using_v4l2 using_v4l2prime using_x11 using_system_libbluray using_system_libudfread using_systemd_notify using_systemd_journal using_drm using_bindings_perl using_bindings_python using_bindings_php using_freetype2 using_mythtranscode using_opengl using_egl using_drm using_vaapi using_nvdec using_vdpau using_ffmpeg_threads using_mheg using_libass using_libxml2 using_libmp3lame

dnf says: mythtv-backend.x86_64 32.0-1.36.20220605git7077a824d2.fc36 @rpmfusion-free-updates

mythbackend

What steps will reproduce the bug?

Enable EIT scan for a capture card and wait.

How often does it reproduce? Is there a required condition?

It varies. I had a few cases when there were 25 aborts in 1 hour. Last night there were fewer aborts. Just now, when there is no active recording, I see another abort. Looking at the recording schedule there were no recordings active when the aborts happened tonight. See below for a backtrace of the last one.

What is the expected behaviour?

mythbackend should work without aborting.

What do you see instead?

mythbackend aborts (then systemd restarts the service).

Additional information

This issue was originally tracked on the mythtv-users list http://lists.mythtv.org/pipermail/mythtv-users/2022-May/409711.html http://lists.mythtv.org/pipermail/mythtv-users/2022-June/409736.html Then as an issue on rpmfusion https://bugzilla.rpmfusion.org/show_bug.cgi?id=6327

A gdb backtrace seems to always show the same place:

Reading symbols from /bin/mythbackend... Reading symbols from /usr/lib/debug/usr/bin/mythbackend-32.0-1.36.20220605git7077a824d2.fc36.x86_64.debug... [New LWP 3630528] ... many threads... This GDB supports auto-downloading debuginfo from the following URLs: https://debuginfod.fedoraproject.org/ Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal] Debuginfod has been disabled. To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit. [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/bin/mythbackend --logpath /var/log/mythtv -v channel,eit --loglevel=debug'. Program terminated with signal SIGABRT, Aborted.

0 __pthread_kill_implementation (threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44

44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0; [Current thread is 1 (Thread 0x7fddabfff640 (LWP 3630528))]

... many (52) traces ... Thread 1 (Thread 0x7fddabfff640 (LWP 3630528)):

0 __pthread_kill_implementation (threadid=, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44

1 0x00007fde7f48eca3 in __pthread_kill_internal (signo=6, threadid=) at pthread_kill.c:78

2 0x00007fde7f43e9c6 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26

3 0x00007fde7f4287f4 in __GI_abort () at abort.c:79

4 0x00007fde7f8d7e40 in std::__glibcxx_assert_fail(char const, int, char const, char const*) (file=file@entry=0x7fde82208013 "/usr/include/c++/12/array", line=line@entry=217, function=function@entry=0x7fde822517d8 "constexpr const std::array<_Tp, _Nm>::value_type& std::array<_Tp, _Nm>::operator const [with _Tp = unsigned char; long unsigned int _Nm = 184; const_reference = const unsigned char&; size"..., condition=condition@entry=0x7fde82208000 "__n < this->size()") at ../../../../../libstdc++-v3/src/c++11/debug.cc:60

5 0x00007fde81c0e04a in std::array<unsigned char, 184ul>::operator[](unsigned long) const (this=0x7fdd0c02dca8, __n=) at /usr/include/c++/12/array:217

6 std::array<unsigned char, 184ul>::operator[](unsigned long) const (__n=, this=0x7fdd0c02dca8) at /usr/include/c++/12/array:214

7 TSPacket::StartOfFieldPointer() const (this=0x7fdd0c02dca4) at mpeg/tspacket.h:218

8 MPEGStreamData::AssemblePSIP(TSPacket const*, bool&) (this=0x7fde340157a8, tspacket=0x7fdd0c02dca4, moreTablePackets=@0x7fddabffe7bf: true) at mpeg/mpegstreamdata.cpp:312

9 0x00007fde81c0e580 in MPEGStreamData::HandleTSTables(TSPacket const*) (this=0x7fde340157a8, tspacket=0x7fdd0c02dca4) at mpeg/mpegstreamdata.cpp:874

10 0x00007fde81c18630 in MPEGStreamData::ProcessTSPacket(TSPacket const&) (this=0x7fde340157a8, tspacket=...) at mpeg/mpegstreamdata.cpp:1064

11 0x00007fde81c0f2b6 in MPEGStreamData::ProcessData(unsigned char const*, int) (this=0x7fde340157a8, buffer=0x7fdd0c02c8d0 "G\306\260\020g", len=8460) at mpeg/mpegstreamdata.cpp:988

12 0x00007fde821bc792 in DVBStreamHandler::RunTS() (this=0x7fde3401fe10) at recorders/dvbstreamhandler.cpp:266

13 0x00007fde821c1b4e in DVBStreamHandler::run() (this=0x7fde3401fe10) at recorders/dvbstreamhandler.cpp:119

14 0x00007fde7fce8447 in QThreadPrivate::start(void*) (arg=0x7fde34009c00) at thread/qthread_unix.cpp:331

15 0x00007fde7f48ce1d in start_thread (arg=) at pthread_create.c:442

16 0x00007fde7f5125e0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

kmdewaal commented 2 years ago

It looks to me that the crash happens because of array boundary checking on std::array [] accesses. Can somebody please explain how these RPMFusion packages are built? Could it be that there has recently been a change in how these packages are built?

kmdewaal commented 2 years ago

Looks like the mythtv packages for RPMFusion are compiled with the -D_GLIBCXX_DEBUG option. This option adds array bound checking to the std::array operator []. This might be considered useful but at the same time optimization is enabled which then removes interesting information from the assert output such as the array index value that causes the assert. I think this option should not be used when compiling packages for the general public. In mythtv the C-style arrays have been replaced by C++-style std::array in the understanding that the operator [] would behave in the same way. Introducing array boundary checking now does gives problems for code that was written 20 years ago when programming conventions were different.

eyaleb commented 2 years ago

Last night I built https://github.com/MythTV/mythtv/tree/fixes/32 and then started running this mythbackend. So far (15 hours later) there were about 30 recordings and no aborts. For this test I scheduled a recording of all programs on two channels (different mpx). This suggests that the above comment may be the underlying cause of the aborts.

I could build again with that debug enabled to see if the aborts come back. What ./configure option will enable this build option? Or maybe you already confirmed this.

knight-of-ni commented 2 years ago

Here is where we set the compiler flags in the mythtv specfile: https://github.com/rpmfusion/mythtv/blob/master/mythtv.spec#L939

This is the optflags macro expanded on Fedora 36:

$ rpm --eval %{optflags}
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection

For reference, here is the entire build log: https://koji.rpmfusion.org/kojifiles/packages/mythtv/32.0/1.36.20220605git7077a824d2.fc36/data/logs/x86_64/build.log

@kmdewaal I don't see -D_GLIBCXX_DEBUG set anywhere in the build process. I do see -D_GLIBCXX_ASSERTIONS, but I don't know the relation between the two.

eyaleb commented 2 years ago

Not sure what all the options are but I compared the options on the first file compiled, between the rpmfusion log and mine (mythtv git default). These are the options that are different (the leading -|+ are from the diff).

Here is a gcc compile:

--- rpmfusion       2022-06-20 23:16:04.021401636 +1000
+++ mythtv          2022-06-20 23:16:11.412347210 +1000
+-DNDEBUG
--DX264_API_IMPORTS
--Dusing_libcec
--O2
+-O3
+-Wdisabled-optimization
--Wp,-D_FORTIFY_SOURCE=2
--Wp,-D_GLIBCXX_ASSERTIONS
--fasynchronous-unwind-tables
--fcf-protection
--fexceptions
--fno-devirtualize
--fstack-clash-protection
--fstack-protector-strong
--grecord-gcc-switches
--m64
--mtune=generic
--specs=/usr/lib/rpm/redhat/redhat-annobin-cc1
--specs=/usr/lib/rpm/redhat/redhat-hardened-cc1

The optimization level is different... but mostly mythtv uses -DNDEBUG -Wdisabled-optimization while rpmfusion uses a whole bunch of options.

And here is a g++ compile:

--- rpmfusion      2022-06-20 23:35:59.057865442 +1000
+++ mythtv         2022-06-20 23:36:03.723832946 +1000
+-DNDEBUG
--Dusing_libcec
--Werror=format-security
--Wp,-D_FORTIFY_SOURCE=2
--Wp,-D_GLIBCXX_ASSERTIONS
--fasynchronous-unwind-tables
--fcf-protection
--fexceptions
--fno-devirtualize
--fstack-clash-protection
--fstack-protector-strong
--grecord-gcc-switches
--m64
--mtune=generic
--specs=/usr/lib/rpm/redhat/redhat-annobin-cc1
--specs=/usr/lib/rpm/redhat/redhat-hardened-cc1

Clearly rather different builds. Which option leads to the abort is beyond me (last time I dug into gcc options it was at version 1).

kmdewaal commented 2 years ago

@knight-of-ni, thanks for the info!

@kmdewaal I don't see -D_GLIBCXX_DEBUG set anywhere in the build process. I do see -D_GLIBCXX_ASSERTIONS, but I don't know the relation between the two.

I found the -D_GLIBCXX_DEBUG in de #include <array> file that is part of the compiler package. But on my test program setting the -D_GLIBCXX_ASSERTIONS does also trigger the abort on std::array operator [] so that looks to be the culprit. It is great that these options do exist as it is a great help for debugging, but they should not be used for a release build.

I plan to figure out where the -D_GLIBCXX_ASSERTIONS comes from and then remove that for the release build. I think that I do want to add it for the debug build as it is useful for development.

About the bug itself, it happens in code that is effectively about 20 years old and it is in the heart of the MythTV recording system. The worst that can happen is that an invalid PSIP packet is created which is then subsequently discarded after a checksum check. It could be that the broadcaster has a wrong configuration setting in his transmission equipment. It could also be that the stream is correct according to the DVB specification but that MythTV does not parse that correct. In order to figure that out I do need a stream that does show the problem. @eyaleb it would be great if you can create a recording of the full transport stream of one GB or so (that is about 15 minutes) so I can reproduce the problem and analyze the stream.

ulmus-scott commented 2 years ago

Part of the problem might be that it is compiled using the debug compile type (as evidenced by the lack of -DNDEBUG), see: https://github.com/rpmfusion/mythtv/blob/53529e438b65d3dab0df98ab66fa299190dcef55/mythtv.spec#L947-L951 and https://github.com/MythTV/mythtv/blob/02247cfba9921c44882f01fd0bda39e7addd4b5d/mythtv/configure#L5156-L5171

kmdewaal commented 2 years ago

Yes I have seen the commit where enable debug is added to the release build but I do not have a good understanding of the effect. It looks like that the enable debug does change the compile_type from release to debug. However, the optflags macro is where the --Wp,-D_GLIBCXX_ASSERTIONS is defined and I do not see where the optflags macro is defined. Could it be that the optflags does depend on the enable debug?

kmdewaal commented 2 years ago

It looks like it comes not out of the mythtv configure script but out of the RPMFusion build system. A local build of the downloaded source package does the same as the koji log from Andrew; the rpmbuild command creates and then executes a temporary file, here /var/tmp/rpm-tmp.63O0RK, where this is defined:

RPM_OPT_FLAGS="-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection"

This are the options that are added to the mythtv compilation flags by executing the script:

Executing(%build): /bin/sh -e /var/tmp/rpm-tmp.63O0RK
+ umask 022
+ cd /home/klaas/rpmbuild/BUILD
+ CFLAGS='-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
+ export CFLAGS
+ CXXFLAGS='-O2  -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64  -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection'
+ export CXXFLAGS

My current thinking is that this comes straight out of the rpmbuild application and that it is thus intended that all RPMFusion are built this way. Which means it is outside the scope of MythTV development to fix this. What is in scope is fixing the actual bug and that will happen. However, given the age of the MythTV application together with the insistence on replacing C-style arrays with C++ style std::arrays there could be more crashes coming.

ulmus-scott commented 2 years ago

The main purpose of enable debug in configure is https://github.com/MythTV/mythtv/blob/02247cfba9921c44882f01fd0bda39e7addd4b5d/mythtv/configure#L6232-L6233 to get the compiler to generate the debug info.

The point I was trying to make is that --compile-type=debug does not define NDEBUG which enables assertions. If a different compile type was used, the assertions may not be compiled and therefore couldn't abort the program.

Regardless, out of bounds memory access is a bug and should be fixed. Just because the C arrays don't complain and the C++ std::arrays might doesn't make the code using C arrays any more correct.

eyaleb commented 2 years ago

@eyaleb it would be great if you can create a recording of the full transport stream of one GB or so (that is about 15 minutes) so I can reproduce the problem and analyze the stream.

@kmdewaal I am happy to help here. Seeing as this problem is intermittent, and an aborted recording will probably not include the bad packet, how do you suggest I acquire the stream? Or maybe it is possible to enable assertions and make a failed assert log a message (and not abort)?

eyaleb commented 2 years ago

Repeated the build using default mythtv.org setup but added -Wp,-D_GLIBCXX_ASSERTIONS to mythtv/mythtv/config.mak before the build. Got the same crash now.

MythTV team: Since I always get the crash in the same exact place, it may help if we fix this one place now then deal with the general array case separately.

On mythtv-users list there is a suggestion: http://lists.mythtv.org/pipermail/mythtv-users/2022-June/409868.html

RPMFusion: should we really release the packages with debug enabled? Does this have a noticeable performance loss?

knight-of-ni commented 2 years ago

RPMFusion: should we really release the packages with debug enabled? Does this have a noticeable performance loss?

Generating debug symbols is part of the normal packaging process for every package in the fedora/redhat environment. At the end of the build process, the debug symbols are extracted and placed into separate debuginfo rpm's. The build environment does this automatically.

Exceptions can be made in extreme cases, and it seems like @hobbes1069 did just that to the rpm specfile, 5 years ago in v29 with this commit: https://github.com/rpmfusion/mythtv/commit/770f835fb74c9ab3610262bc027534dc1b1bcfbb

I know I'm testing his memory from 5 years ago, but perhaps he might recall something.

The attempt to disable debug mode is still present in the spec file, but for some reason it is not taking effect. https://github.com/rpmfusion/mythtv/blob/53529e438b65d3dab0df98ab66fa299190dcef55/mythtv.spec#L99

I need to look into that.

hobbes1069 commented 2 years ago

I think the %bcond_without actually means debug was on by default (bcond's are weird). I'm sure I made those changes for a reason but, yeah, slept a few to many times since then to remember.

linuxdude42 commented 2 years ago

@kmdewaal, how about adding something like this to MPEGStreamData::ProcessTSPacket, immediately after the check for a scrambled packet?

    if (tspacket.HasAdaptationField())
    {
        size_t afsize = tspacket.AdaptationFieldSize();
        bool validsize = (tspacket.HasPayload())
            ? afsize <= 182
            : afsize == 183;
        if (!validsize)
        {
            if (VERBOSE_LEVEL_CHECK(VB_RECORD, LOG_DEBUG))
            {
                LOG(VB_RECORD, LOG_DEBUG, QString("Invalid adaptation field, type %3, size %4")
                    .arg(tspacket.AdaptationFieldControl()).arg(afsize));
            }
            return true;
        }
    }

This only adds two instructions to the code path when the packet doesn't have an adaptation field. There is an instruction to test the bit (from a byte that has already been loaded into a register) and an instruction to branch away to handle the size checking if the bit is set.

This compiles, but I have no way to test it.

eyaleb commented 2 years ago

I am running a test overnight using the mythtv.org build, with -Wp,-D_GLIBCXX_ASSERTIONS added and the above adaptation field size check. In the past there were a few aborts with this.

kmdewaal commented 2 years ago

This is what https://en.cppreference.com/w/cpp/container/array/operator_at has to say:

image

kmdewaal commented 2 years ago

Please note that this is not about debug symbols being generated, there is nothing against that. Also, the code seems to be optimized because the relevant information, such as the index value that is causing the abort, is optimized away. This information should be available if it really was a correct debug build.

kmdewaal commented 2 years ago

@kmdewaal, how about adding something like this to MPEGStreamData::ProcessTSPacket, immediately after the check for a scrambled packet?

    if (tspacket.HasAdaptationField())
    {
        size_t afsize = tspacket.AdaptationFieldSize();
        bool validsize = (tspacket.HasPayload())
            ? afsize <= 182
            : afsize == 183;
        if (!validsize)
        {
            if (VERBOSE_LEVEL_CHECK(VB_RECORD, LOG_DEBUG))
            {
                LOG(VB_RECORD, LOG_DEBUG, QString("Invalid adaptation field, type %3, size %4")
                    .arg(tspacket.AdaptationFieldControl()).arg(afsize));
            }
            return true;
        }
    }

This only adds two instructions to the code path when the packet doesn't have an adaptation field. There is an instruction to test the bit (from a byte that has already been loaded into a register) and an instruction to branch away to handle the size checking if the bit is set.

This compiles, but I have no way to test it.

Yes I think this is a good test.

kmdewaal commented 2 years ago

@eyaleb as discussed can you make a recording of a full transport stream that will show the problem? This can be done as follows:

The resulting transport stream file will be big but you can send that to me with gmail or with any other file sharing service.

eyaleb commented 2 years ago

I now have a short recording (143MB, about 1 minute) which logged many errors. This recording was set to start at 12:04:00 so the errors started soon, then I stopped the recordings.

I do not use gmail and will probably have issues trying to email (or upload) this file using my service. Where can I upload this file for you?

BTW, I regularly get noisy recordings (all the way from a small blip up to unwatchable) - is this the source of these errors?

Here are the relevant logged messages anyway (I can provide the full log is it helps):

2022-06-22 12:04:00.000742 I [2792107/2792192] Scheduler tv_rec.cpp:417 (StartRecording) - TVRec[51]: StartRecording("3A test")
...
2022-06-22 12:04:24.837819 D [2792107/2849783] DVBRead recorders/dvbstreamhandler.cpp:114 (run) - DVBSH[33](/dev/dvb/adapter-OPEN-ELEC-2/frontend0): run(): begin
...
2022-06-22 12:04:29.210498 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 3, size 227
2022-06-22 12:04:30.014767 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 158
2022-06-22 12:04:30.115058 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 94
2022-06-22 12:04:36.200598 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 121
2022-06-22 12:04:36.351188 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 212
2022-06-22 12:04:38.915607 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 87
2022-06-22 12:04:39.267405 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 2
2022-06-22 12:04:39.317568 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 0
2022-06-22 12:04:43.188264 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 149
2022-06-22 12:04:45.400122 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 3, size 233
2022-06-22 12:04:45.701347 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 101
2022-06-22 12:04:47.511589 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 164
2022-06-22 12:04:47.561748 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 29
2022-06-22 12:04:47.611886 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 67
2022-06-22 12:04:48.164775 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 109
2022-06-22 12:04:49.622613 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 3, size 229
2022-06-22 12:04:49.672776 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 7
2022-06-22 12:04:49.823520 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 187
2022-06-22 12:04:55.404025 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 187
2022-06-22 12:04:58.269173 D [2792107/2849783] DVBRead mpeg/mpegstreamdata.cpp:1026 (ProcessTSPacket) - Invalid adaptation field, type 2, size 200
...
2022-06-22 12:05:03.899415 D [2792107/2849783] DVBRead recorders/dvbstreamhandler.cpp:121 (run) - DVBSH[33](/dev/dvb/adapter-OPEN-ELEC-2/frontend0): run(): end
eyaleb commented 2 years ago

BTW, during the 2 hours since the test, with no recordings, over 1600 such messages were logged. I expect these follow EIT scan activity.

Also interesting, since the recent upgrade 2022-06-17T17:57:27+1000 DEBUG Upgraded: mythtv-backend-debuginfo-32.0-1.36.20220605git7077a824d2.fc 36.x86_64 I see a relatively small number of new (and similar) messages, probably masking a similar problem elsewhere:

2022-06-17 22:45:45.479368 E [2947587/2947718] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[34] Invalid adaptation field size:233  control:3
2022-06-17 23:00:02.722571 E [2954161/2954355] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[34] Invalid adaptation field size:236  control:3
2022-06-17 23:19:32.730006 E [2968021/2968167] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[34] Invalid adaptation field size:200  control:3
2022-06-17 23:19:32.730016 E [2968021/2968167] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:200  control:3
2022-06-17 23:41:24.190614 E [2984044/2984278] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:189  control:3
2022-06-17 23:42:42.051224 E [2984044/2984278] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:192  control:3
2022-06-19 22:05:18.221567 E [4096266/4117498] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:221  control:3
2022-06-20 18:34:35.683487 E [4096266/3694053] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:220  control:3
2022-06-20 21:26:30.851572 E [4096266/3783604] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:194  control:3
2022-06-21 00:37:04.984151 E [4096266/3961543] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:185  control:3
2022-06-21 01:00:43.486008 E [4096266/3972140] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:189  control:3
2022-06-21 18:26:44.560141 E [3210401/3382008] DVBRead mpeg/pespacket.cpp:59 (AddTSPacket) - PESPacket[33] Invalid adaptation field size:238  control:3
kmdewaal commented 2 years ago

Thanks for your very quick action!

I now have a short recording (143MB, about 1 minute) which logged many errors. This recording was set to start at 12:04:00 so the errors started soon, then I stopped the recordings.

I do not use gmail and will probably have issues trying to email (or upload) this file using my service. Where can I upload this file for you?

I suggest wetransfer.com which allows up to 2GB so your recording will fit.

BTW, I regularly get noisy recordings (all the way from a small blip up to unwatchable) - is this the source of these errors?

Could very well be. Possible causes that I can think of:

  1. transport stream packets being corrupted by broken tuner hardware, misdirected antenna's, interference etc.
  2. transport stream encoder of the broadcaster has a wrong configuration so the stream does not conform to standards
  3. transport stream encoder of the broadcaster has an obscure configuration that creates a stream that does conform to standards

Case (1) is then your noisy recordings, but I would then expect many more errors in the logs, e.g. about packet sequence numbers being wrong. If it is (2) or (3) then I hope to figure out what it is by looking at the packets and if it is (3) the MythTV decoder needs to be fixed.

My best guess at the moment is case (1). As previously mentioned, the packet handler code is close to 20 years old and has proven to be resilient against noisy recordings over the years. The reasons why it now fails on you are:

  1. C-style arrays have been replaced by C++ style std::array template instantiations recently in MythTV code
  2. Operator [] accesses on std::array should not be bound checked according to cppreference.com but GCC has an undocumented option to do that anyway, and the RPMFusion build does activate that option
  3. The vast majority of MythTV users do use Ubuntu and the Ubuntu packages. A few people including me use Fedora but I do build from source and do not use the RPMFusion MythTV packages.
  4. Building from source, even the MythTV debug builds do not activate std::array boundary checking.
  5. Of those who use Fedora and RPMFusion packages probably not everybody is already on version 32.
eyaleb commented 2 years ago

I suggest wetransfer.com

I think I got it. I uploaded two files, the full multiplex .ts and the mythbackend.log for the same time. I did not have an email to send it to so I sent it to myself and here are the links. I hope this is the correct way to do this.

https://we.tl/t-nyVKlEXIe5  mythbackend.log
https://we.tl/t-weDesMn9R6  20300_20220622020400.ts
kmdewaal commented 2 years ago

@eyaleb Have downloaded both files, this went very smooth. Thanks! I did have a brief look and surprise, the transport stream looks perfect at first sight. No continuity errors, no packet losses, no error packets, EIT correctly displayed. Also, as shown by the mythbackend log, your recording has a continuity_error_count of 0 which is perfect. I have to think about this for a while.

eyaleb commented 2 years ago

I also noticed a clean playback, but thought this is just one channel on this multiplex and I do not know how to play the other channels. Interestingly, while the message came at a high rate (38369 yesterday) Just now as I was getting ready for bed I noticed there were none for 15 minutes, then 700 messages in 30 seconds and quiet again for the last 5 minutes. Very bursty. No recordings active.

An aside, should I somehow delete the files from wetransfer or do I just leave them to lapse? I am new to this service.

kmdewaal commented 2 years ago

@eyaleb can you do the same test again but then with the testcode modified to this:

    // KdW test -- Testcode from David Hampton
    if (tspacket.HasAdaptationField())
    {
        size_t afsize = tspacket.AdaptationFieldSize();
        bool validsize = (tspacket.HasPayload())
            ? afsize <= 182
            : afsize == 183;
        if (!validsize)
        {
            if (VERBOSE_LEVEL_CHECK(VB_RECORD, LOG_INFO))
            {
                LOG(VB_RECORD, LOG_INFO, QString("Invalid adaptation field, type %3, size %4")
                    .arg(tspacket.AdaptationFieldControl()).arg(afsize) + "\n" +
                    tspacket.toString());
            }
            // return true;     // KdW test -- Just keep on running
        }
    }

Differences are:

Compile this without the GLIBC symbol defined so that mythbackend keeps on running. Can you send please the full transport stream and the complete mythbackend log again? Using wetransfer.com worked OK. About deleting files from wetransfer, I think that it is automatically deleted after a few days unless you start paying so I should not worry about it.

eyaleb commented 2 years ago

At first I ran with the assertions enabled and got aborts, mostly during EIT scans. See the logs: https://we.tl/t-pvs5FTAoOH

I was unable, so far, to get any "Invalid" message during a full transport recording, I will keep trying and report back.

eyaleb commented 2 years ago

I managed to grab a short recording of a full transport with a few error logs. Problem is that I get many "Invalid" messages during EIT scan when no recording goes on and if I want a short recording I need to start it shortly before the log. I just need a time machine.

For this test I run mythbackend with "-v channel,eit,record --loglevel=info". In the future I will change to only "-v record".

See the recording and log here https://we.tl/t-LFCNZ3oY4o

eyaleb commented 2 years ago

I spent some time looking at the tuners (all USB) and could see that one of them is very noisy. I changed the capture card priorities (put it last for everything) and so far the situation is (relatively) quieter.

I am rather convinced that the bad tuner was delivering very noisy data, which is a good test for this software :-)

kmdewaal commented 2 years ago

Looking at the packet headers it does look indeed that the broken packets are produced by a bad tuner. It also looks like the transport stream recording is from a tuner that works correct while the packet header messages are from the bad tuner. The easiest way to determine which tuner is faulty is to use the mythtv-setup "Input Connections" and disconnect all but one tuner from the video source etc.

I will put the adaptation field size check in master and if nothing happens then a few days later in fixes/32, as is the procedure. That will solve this problem but not necessarily all problems because, as mentioned, the changes to the code to use std::array are fairly recent and the code has never been tested with these boundary checks enabled, as far as I know.

As for RPMFusion building their packages with array boundary checks enabled, that is obviously their choice. The purpose of Fedora and probably also RPMFusion is to test software for later inclusion in the Redhat distribution and not necessarily to give users a stable platform. One might argue this test had success here. One might also argue that Ubuntu is a better choice if one does not want to be a guinea pig.

Thanks @eyaleb for reporting and testing, thanks @linuxdude42 for providing the fix and I am happy that this is resolved.

linuxdude42 commented 2 years ago

I agree with @kmdewaal that the transport stream recording is from a tuner that works correctly. tsanalyze doesn't show any problems with the stream. If you can determine which is the faulty tuner, I'd love to see a short transport stream recording from that tuner.

eyaleb commented 2 years ago

@linuxdude42

I'd love to see a short transport stream recording from that tuner.

I uploaded one earlier (https://github.com/MythTV/mythtv/issues/589#issuecomment-1163922542), but can do it again. What do you need?

Looking at the log I can see that the EIT scan is regularly leading to problems, more so than actual recordings. Right now, the EIT is on input [37] which is input dapter-OPEN-ELEC-3 (as I set up). It used to be on another tuner with similar performance.

What I notice is that the burst of errors almost always arrived with this message: 'TVRecEvent tv_rec.cpp:3442 (HandleTuning) - TVRec[37]: HandleTuning Request: Program(NULL) channel(52) input() flags(EITScan,)'. [Note: TVRec[37] is tuner 3]

Specifically channel(52). Maybe there is an issue with this channel? Yes, Watching live TV clearly shows that tuners 1/2/3 have great difficulty tuning to channel 52 while the others are OK. I now changed EIT scan to tuner 4. The good thing is that I never watch or record this channel...

I need to sort out my tuners anyway, but until then I can run more tests if needed.

Example log with no recording, showing count of errors, with bursts:

Fri Jun 24 07:39:52 AEST 2022 9532
Fri Jun 24 07:40:52 AEST 2022 9532
Fri Jun 24 07:41:52 AEST 2022 9532
Fri Jun 24 07:42:52 AEST 2022 9532
Fri Jun 24 07:43:52 AEST 2022 9532
Fri Jun 24 07:44:52 AEST 2022 9532
Fri Jun 24 07:45:52 AEST 2022 9532
Fri Jun 24 07:46:52 AEST 2022 9532
Fri Jun 24 07:47:53 AEST 2022 9532
Fri Jun 24 07:48:53 AEST 2022 9532
Fri Jun 24 07:49:53 AEST 2022 9532
Fri Jun 24 07:50:53 AEST 2022 9532
Fri Jun 24 07:51:53 AEST 2022 9532
Fri Jun 24 07:52:53 AEST 2022 9532
Fri Jun 24 07:53:53 AEST 2022 9532
Fri Jun 24 07:54:53 AEST 2022 9532
Fri Jun 24 07:55:53 AEST 2022 9532
Fri Jun 24 07:56:53 AEST 2022 9532
Fri Jun 24 07:57:53 AEST 2022 9532
Fri Jun 24 07:58:53 AEST 2022 9532
Fri Jun 24 07:59:53 AEST 2022 9532
Fri Jun 24 08:00:53 AEST 2022 9532
Fri Jun 24 08:01:53 AEST 2022 9532
Fri Jun 24 08:02:53 AEST 2022 9532
Fri Jun 24 08:03:53 AEST 2022 10021
Fri Jun 24 08:04:53 AEST 2022 10021
Fri Jun 24 08:05:53 AEST 2022 10021
Fri Jun 24 08:06:53 AEST 2022 10021
Fri Jun 24 08:07:53 AEST 2022 10021
Fri Jun 24 08:08:53 AEST 2022 10021
Fri Jun 24 08:09:53 AEST 2022 10021
Fri Jun 24 08:10:53 AEST 2022 10021
Fri Jun 24 08:11:53 AEST 2022 10021
Fri Jun 24 08:12:53 AEST 2022 10021
Fri Jun 24 08:13:53 AEST 2022 10021
Fri Jun 24 08:14:53 AEST 2022 10021
Fri Jun 24 08:15:53 AEST 2022 10021
Fri Jun 24 08:16:53 AEST 2022 10021
Fri Jun 24 08:17:53 AEST 2022 10021
Fri Jun 24 08:18:53 AEST 2022 10021
Fri Jun 24 08:19:53 AEST 2022 10021
Fri Jun 24 08:20:53 AEST 2022 10021
Fri Jun 24 08:21:53 AEST 2022 10021
Fri Jun 24 08:22:53 AEST 2022 10021
Fri Jun 24 08:23:53 AEST 2022 10021
Fri Jun 24 08:24:53 AEST 2022 10021
Fri Jun 24 08:25:53 AEST 2022 10021
Fri Jun 24 08:26:53 AEST 2022 10021
Fri Jun 24 08:27:53 AEST 2022 10021
Fri Jun 24 08:28:53 AEST 2022 10507
Fri Jun 24 08:29:53 AEST 2022 10507
Fri Jun 24 08:30:54 AEST 2022 10507
Fri Jun 24 08:31:54 AEST 2022 10507
Fri Jun 24 08:32:54 AEST 2022 10507
Fri Jun 24 08:33:54 AEST 2022 10507
Fri Jun 24 08:34:54 AEST 2022 10507
Fri Jun 24 08:35:54 AEST 2022 10507
Fri Jun 24 08:36:54 AEST 2022 10507
Fri Jun 24 08:37:54 AEST 2022 10507
Fri Jun 24 08:38:54 AEST 2022 10507
Fri Jun 24 08:39:54 AEST 2022 10507
Fri Jun 24 08:40:54 AEST 2022 10507
Fri Jun 24 08:41:54 AEST 2022 10507
Fri Jun 24 08:42:54 AEST 2022 10507
Fri Jun 24 08:43:54 AEST 2022 10507
Fri Jun 24 08:44:54 AEST 2022 10507
Fri Jun 24 08:45:54 AEST 2022 10507
Fri Jun 24 08:46:54 AEST 2022 10507
Fri Jun 24 08:47:54 AEST 2022 10507
Fri Jun 24 08:48:54 AEST 2022 10723
Fri Jun 24 08:49:54 AEST 2022 10914
Fri Jun 24 08:50:54 AEST 2022 10914
Fri Jun 24 08:51:54 AEST 2022 10914
Fri Jun 24 08:52:54 AEST 2022 10914
Fri Jun 24 08:53:54 AEST 2022 10914
Fri Jun 24 08:54:54 AEST 2022 10914
Fri Jun 24 08:55:54 AEST 2022 10914
Fri Jun 24 08:56:54 AEST 2022 10914
Fri Jun 24 08:57:54 AEST 2022 10914
Fri Jun 24 08:58:54 AEST 2022 10914
Fri Jun 24 08:59:54 AEST 2022 10914
Fri Jun 24 09:00:54 AEST 2022 10914
Fri Jun 24 09:01:54 AEST 2022 10914
Fri Jun 24 09:02:54 AEST 2022 10914
Fri Jun 24 09:03:54 AEST 2022 10914
Fri Jun 24 09:04:54 AEST 2022 10914
Fri Jun 24 09:05:54 AEST 2022 10914
Fri Jun 24 09:06:54 AEST 2022 10914
Fri Jun 24 09:07:54 AEST 2022 10914
Fri Jun 24 09:08:54 AEST 2022 10914
linuxdude42 commented 2 years ago

I ran an analyzer on the previous recording and it didn't show any problems. Could you post a minute of the full transport stream that includes channel 52 using one of your first three tuners. I'd like to see how bad it is, and maybe export a packet or two to use as test cases.

eyaleb commented 2 years ago

@linuxdude42 I recorded a full transport (includes ch 52) on tuner 1. When I tried to watch it I got mostly noise. https://we.tl/t-gLqftiDzOm

BTW, opening "Watch Recordings" in the frontend while this recording was active caused the FE to crash without showing the list. I needed to do this in order to stop the recording. I ended restarting the BE instead. A problem for another day...

linuxdude42 commented 2 years ago

Here's the top level analysis of the good stream:

===============================================================================
|  TRANSPORT STREAM ANALYSIS REPORT                                           |
|=============================================================================|
|  Transport Stream Id: ....... 2055 (0x0807)  |  Services: .............. 8  |
|  Bytes: ....................... 215,228,604  |  PID's: Total: ......... 34  |
|  TS packets: .................... 1,144,833  |         Clear: ......... 34  |
|     With invalid sync: .................. 0  |         Scrambled: ...... 0  |
|     With transport error: ............... 0  |         With PCR's: ..... 7  |
|     Suspect and ignored: ................ 0  |         Unreferenced: ... 0  |
|-----------------------------------------------------------------------------|
|  Transport stream bitrate, based on ....... 188 bytes/pkt    204 bytes/pkt  |
|  User-specified: ................................... None             None  |
|  Estimated based on PCR's: ............... 23,052,742 b/s   25,014,677 b/s  |
|  Selected reference bitrate: ............. 23,052,742 b/s   25,014,677 b/s  |
|-----------------------------------------------------------------------------|
|  Broadcast time: ................................... 74 sec (1 min 14 sec)  |
|  First TDT UTC time stamp: ........................... 2022/06/23 03:46:23  |
|  Last TDT UTC time stamp: ............................ 2022/06/23 03:47:37  |
|  First TOT local time stamp: ......................... 2022/06/23 13:46:23  |
|  Last TOT local time stamp: .......................... 2022/06/23 13:47:36  |
|  TOT country code: ................................................... AUS  |
|-----------------------------------------------------------------------------|
|  Srv Id  Service Name                              Access          Bitrate  |
|  0x0807  10 Canberra .................................. C    3,836,334 b/s  |
|  0x0827  10 Peach ..................................... C    2,721,486 b/s  |
|  0x0847  10 BOLD ...................................... C    3,168,855 b/s  |
|  0x0867  SBN .......................................... C    1,844,126 b/s  |
|  0x0887  Sky News Regional ............................ C    1,775,139 b/s  |
|  0x08A7  10 Canberra .................................. C    3,836,334 b/s  |
|  0x08C7  10 HD Canberra ............................... C    6,752,539 b/s  |
|  0x08E7  10 SHAKE ..................................... C    1,954,171 b/s  |
|                                                                             |
|  Note 1: C=Clear, S=Scrambled                                               |
|  Note 2: Unless specified otherwise, bitrates are based on 188 bytes/pkt    |
===============================================================================
linuxdude42 commented 2 years ago

And the top level analysis of the bad stream:

===============================================================================
|  TRANSPORT STREAM ANALYSIS REPORT                                           |
|=============================================================================|
|  Transport Stream Id: ....... 2055 (0x0807)  |  Services: .............. 8  |
|  Bytes: ....................... 181,308,140  |  PID's: Total: ...... 6,443  |
|  TS packets: ...................... 964,405  |         Clear: ........ 269  |
|     With invalid sync: .................. 0  |         Scrambled: .. 6,174  |
|     With transport error: ......... 283,030  |         With PCR's: . 4,128  |
|     Suspect and ignored: ........... 15,429  |         Unreferenced  6,389  |
|-----------------------------------------------------------------------------|
|  Transport stream bitrate, based on ....... 188 bytes/pkt    204 bytes/pkt  |
|  User-specified: ................................... None             None  |
|  Estimated based on PCR's: .............. 381,513,164 b/s  413,982,369 b/s  |
|  Selected reference bitrate: ............ 381,513,164 b/s  413,982,369 b/s  |
|-----------------------------------------------------------------------------|
|  Broadcast time: ..................................... 3 sec (0 min 3 sec)  |
|  First TDT UTC time stamp: ........................... 2022/06/24 02:59:54  |
|  Last TDT UTC time stamp: ............................ 2022/06/24 03:01:36  |
|  First TOT local time stamp: ......................... 2022/06/24 12:59:59  |
|  Last TOT local time stamp: .......................... 2022/06/24 13:01:35  |
|  TOT country code: ................................................... AUS  |
|-----------------------------------------------------------------------------|
|  Srv Id  Service Name                              Access          Bitrate  |
|  0x0807  10 Canberra .................................. S   36,415,646 b/s  |
|  0x0827  10 Peach ..................................... S   29,900,603 b/s  |
|  0x0847  10 BOLD ...................................... S   39,782,945 b/s  |
|  0x0867  SBN .......................................... S   16,432,593 b/s  |
|  0x0887  Sky News Regional ............................ S   16,393,430 b/s  |
|  0x08A7  10 Canberra .................................. S   36,416,042 b/s  |
|  0x08C7  10 HD Canberra ............................... S   75,399,886 b/s  |
|  0x08E7  10 SHAKE ..................................... S   24,741,657 b/s  |
|                                                                             |
|  Note 1: C=Clear, S=Scrambled                                               |
|  Note 2: Unless specified otherwise, bitrates are based on 188 bytes/pkt    |
===============================================================================
linuxdude42 commented 2 years ago

Just a couple of things I noticed in the analysis of the bad stream:

eyaleb commented 2 years ago

Not sure if it is clear (I do not see it mentioned) but the troublesome channel 52 is "10 Peach". Also, now that I changed the EIT scan to a "good" tuner I saw no "Invalid" errors logged in the last 12 hours.

kmdewaal commented 2 years ago

Closing this issue as it is now fixed in master and in fixes/32.