MythTV / mythtv

The official MythTV repository
https://www.mythtv.org
GNU General Public License v2.0
710 stars 345 forks source link

Tvmaze metadata lookup failures #654

Open SteveErl opened 2 years ago

SteveErl commented 2 years ago

What steps will reproduce the bug?

Using the TVmaze service for metadata lookup, there are rare cases where the 'runtime' field is empty, but 'averageRuntime' is not. This happened when looking up metadata for a TV show called "Everything's Trash". Note that the database entry for this show was subsequently updated to include the missing 'runtime' field. But on the night it was recorded, that field was empty.

For the same show, the early contents of the database entries had 'airdate' populated, but no 'airtime'. I'm guessing that at the time the data was populated, the broadcast time had not been decided yet. The missing 'airtime' caused the specific episode lookup to fail.

How often does it reproduce? Is there a required condition?

It's rare, but when it happens, it causes metadata lookups to fail.

What is the expected behaviour?

The lookup scripts should be more flexible to handle these sorts of issues. If the 'runtime' field is empty, try using the 'averageRuntime' field. If the 'averageRuntime' field is also empty, set a default runtime length. If the 'airtime' field is empty, look for an episode which matches the 'airdate' field.

What do you see instead?

Additional information

Here's an example of the original data for "Everything's Trash":

{"id":60518, "url":"https://www.tvmaze.com/shows/60518/everythings-trash", "name":"Everything's Trash", "type":"Scripted", "language":"English", "genres":[], "status":"Running", "runtime":null, "averageRuntime":30, "premiered":"2022-07-13", "ended":null, "officialSite":null, "schedule": {"time":"","days":[]}, "rating": {"average":null}, "weight":94,

SteveErl commented 2 years ago

I have a solution for this bug. Once I figure out how to push my solution to my GitHub branch, the fix will be available.

rcrdnalor commented 2 years ago

Looking at your provided patch, this issue actually addresses four items:

Once you considered this feedback, please open a pull request against mythtv/master on git which addresses above items in separately commits and explain the need for change. See https://www.mythtv.org/wiki/Development_guide and https://www.mythtv.org/wiki/Submitting_Bug_Fixes for details. Writing detailed pull requests against current mythtv/master allows other developers and users to efficiently review them and incorporate the changes.

SteveErl commented 2 years ago

@rcrdnalor : Thanks for your feedback.

Bullet 2 - I agree that assigning a default of 60 minutes was an ugly hack. I'll rework the code to eliminate it.

Bullet 4 - It's true that some TV shows air multiple original broadcast episodes in one day, but that's rare. There are cases where the TvMaze database provides an airdate but doesn't provide a timestamp, but that's very rare. So, in the extremely rare case of both of these happening simultaneously, there would be multiple episodes which would be possible matches for the recording. There's no demand that there be only one episode per day. The script will have more than one 'matchesFound', and will print them all out. It's already designed to do this. If all you have is a date, and there are 5 episodes matching that date, then all 5 should be printed out.

If I follow your suggestion to break down my solution into smaller chunks, do I have to create a new Issue for each one? Or can there be four commits corresponding to one Issue, with each one adding a little bit more? Does each commit require its own pull request, or can 4 commits go in one pull request?

The Submitting Bug Fixes document says to "run the current unstable/development version" but that seems dangerous to me. I need my MythTv to run in a stable fashion every day, so I'd rather have my system based on fixes/32. How can you test your code on an unstable release? Do you have to run it on a different machine? The Submitting Bug Fixes document also suggests using Trac, but isn't that deprecated now? I'm under the impression that the current process is to use GitHub to document bugs and features, but it's frustrating that I cannot see how to assign myself an issue which I've created.

rcrdnalor commented 2 years ago

Writing pull requests should be always against git/master. If you use *buntu then this may help to install master for testing: https://github.com/MythTV/packaging/tree/master/deb-light For the separation of the bullets, I think I will do item 1 and item 3, anyway. Please focus on items 2 & 4.

Just a quick question: You said, that in 'rare cases' tvmaze does not provide early data. If you fetch the metadata for that cases the next or the other day, does it succeed?

SteveErl commented 2 years ago

@rcrdnalor : Could you please clarify the process? When you said "open a pull request against mythtv/master ... which addresses above items in separate commits", does that mean I should have 4 separate commits against one issue (this one), or would I need to create 4 separate issues, with one commit against each one? I've started porting my changes to git/master. So far, I've implemented bullet 1. Should I commit that change with the explanation for the need for the change in the commit comment, or does the need for the change have to be documented in its own issue? Should there be one pull request for all 4 bullet items, or 4 pull requests for the 4 bullets?

SteveErl commented 2 years ago

@rcrdnalor : In regards to your quick question, I first noticed missing 'runtime' and 'airtime' fields in the TvMaze database when processing an episode of a show called "Everything's Trash". At the broadcast time of the show (7/27/22), and 2 days later (7/29/22), the database had:

"id":2356087, ... "airdate":"2022-07-27", "airtime":"","airstamp":"2022-07-27T16:00:00+00:00","runtime":null

I think about a week later, when I checked again, it showed

"id":2356087,... "airdate":"2022-07-27", "airtime":"22:00","airstamp":"2022-07-28T02:00:00+00:00","runtime":30

It appears that someone updated the database between these fetch times. Since the metadata is fetched right after the recording has finished, an update a week later doesn't really help.

SteveErl commented 2 years ago

Updating the solution to split it into smaller commit components and to base it off of mythtv/master.

rcrdnalor commented 2 years ago

Steve, I see good progress here. Please keep in mind that the commit messages of your commits in your pull requests are empty. Future reader of this source code will not see the motivation of the change and need to go down the hard way to identify the issue behind the pull request. The text you wrote in the pull request message will not be included in the commit message when the pull request will be applied.

See https://docs.github.com/en/issues/tracking-your-work-with-issues/linking-a-pull-request-to-an-issue BTW, the keyword "Refs https://github.com/MythTV/mythtv/issues/654" should work as well within your commit message of your local branch.

Thank you for improving MythTV, regards, Roland

SteveErl commented 2 years ago

@rcrdnalor : I've amended commit messages and have generated 3 pull requests: #658, #662, #663. The fourth commit will be slightly dependent on the 3rd one getting merged to master. I'll have to wait for that to happen before proceeding with generation of that commit and the corresponding pull request.

SteveErl commented 1 year ago

The UnboundLocalError is not necessarily a bug in python3. It seems to be in the Mythtv version of dt.py. Here's the stack trace when show_tz = 'Asia/Jerusalem':

Traceback (most recent call last):
  File "programs/scripts/metadata/Television/tvmaze.py", line 767, in <module>
    main()
  File "programs/scripts/metadata/Television/tvmaze.py", line 744, in main
    buildNumbers(args, opts)
  File "programs/scripts/metadata/Television/tvmaze.py", line 235, in buildNumbers
    dtInTgtZone = dtInLocalZone.astimezone(posixtzinfo(show_tz))
  File "/usr/lib/python3/dist-packages/MythTV/utility/singleton.py", line 49, in __call__
    inst = type.__call__(cls, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/MythTV/utility/dt.py", line 234, in __init__
    self._process(fd, version)
  File "/usr/lib/python3/dist-packages/MythTV/utility/dt.py", line 167, in _process
    t = unpack(ttmfmt, fd.read(calcsize(ttmfmt)))[0]
UnboundLocalError: local variable 'ttmfmt' referenced before assignment
rcrdnalor commented 1 year ago

Looking at the definition:

$ file /usr/share/zoneinfo/Asia/Jerusalem 
/usr/share/zoneinfo/Asia/Jerusalem: timezone data (fat), version 3, 9 gmt time flags, 9 std time flags, no leap seconds, 149 transition times, 9 local time types, 21 abbreviation chars

the version 3 is not supported by https://github.com/MythTV/mythtv/blob/00e4fc531cb87e117e035f0e967b79847c9982f1/mythtv/bindings/python/MythTV/utility/dt.py#L119 Please open an issue against the python bindings.

SteveErl commented 1 year ago

@rcrdnalor : Thanks. I've opened #672 for the issue. Should I change the comment to say UnboundLocalError is needed until version 3 of the timezone protocol is supported?

linuxdude42 commented 1 year ago

Sorry, didn't meant to jump in here. I was looking at the list of pull requests, and didn't realize they were all connected to this issue. Hope I haven't caused a problem.

SteveErl commented 1 year ago

@linuxdude42 : No apology necessary. I've been waiting a long time for any progress on getting these changes merged. Thank you very much!