XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
300 stars 94 forks source link

tv_grab_uk_tvguide: addition of 'ignore channel' list to --configure #196

Closed honir closed 1 year ago

honir commented 1 year ago

The --configure methods for this grabber have become unusable. Of the two currently working methods for fetching a list of channels, one (--method 1) fetches lots of seemingly duplicate channels, while the other (--method 2) has over 100 channels missing.

The duplicates make it very hard (impossible?) to select the appropriate channels for fetching data.

On closer inspection they are not in fact duplicates but appear to be deprecated 'id's within TVG. E.g. there are 16 versions of Sky Sports Golf (each with a different TVG id) but only one of them actually has listings: the rest return an empty schedule.

So I've added a way of filtering these out from the channel list.

. To remove these I've added two new options to the grabber to allow you to maintain a list of defunct channel ids which will be dropped from the channel list during --configure.

--makeignorelist <filename> Create a local file containing a list of channel ids which have no data.

--useignorelist <filename> Use a local file of channel ids which will be removed from the list of possible channels in --configure.

. You don't have to use these options, but they cut the number of channels down from 1209 to 493 (i.e. removing 716 channels with no data).

--makeignorelist will take about an hour to run (it tries to fetch a schedule for each of the possible channels) but you only need to run it infrequently (not every time you run --configure).

. Before I publish the changes I'd be grateful for any feedback, and help in testing the changes. Does this process work for you? Should this process in fact be made the default?

You can download the new code from here (my repo): https://github.com/honir/xmltv/blob/master/grab/uk_tvguide/tv_grab_uk_tvguide (just backup & replace your existing tv_grab_uk_tvguide program).

perldoc {path_to}/tv_grab_uk_tvguide

.

@mkbloke Do you still use this grabber? I will value your input, and appreciate any testing you are able to undertake.

mkbloke commented 1 year ago

@honir, yes, I still use tv_grab_uk_tvguide daily.

This problem with duplicated non-data channels with different IDs even makes selecting additional channels on the website a bad experience. The whole thing is utterly broken. I thought I'd contacted them about this previously, but looking back at correspondence, I can't see mention of it now, so perhaps I didn't bother to mention it in the end.

Caching the channel-IDs-to-ignore list seems like a good compromise, as the only way to know if a channel has data is to look for it for every channel and that, of course, is super slow!

Thanks for your continued work on this. I'll install a test environment later to try your latest changes and report back.

honir commented 1 year ago

I've convinced myself the creation and use of the channels 'ignorelist' should be the default: for the benefit of the majority of users who will never read the documentation or get beyond the basic --configure usage.

So if neither --makeignorelist nor --useignorelist are specified then they will be set to a default value.

(Repo updated)

mkbloke commented 1 year ago

Hi,

Sorry for taking longer than expected to come back on this. Below are the notes I made for a response. I was going to check through the resultant XML data with a viewer too, but have not got around to it yet.

"I get very slightly different figures to you, having done two runs, one on Saturday evening and one on Monday evening: 1209 channels; 714 ignored; 495 remaining, for both runs. On a well-connected VPS, these runs each took 44-45 minutes for me.

This is likely the same problem I noted in: https://github.com/XMLTV/xmltv/pull/166#issuecomment-1050103487 regarding channels that do not broadcast every day. There's not much that can be done about it though and it would seem they are few.

One thing I have noticed is that when using --configure --makeignorelist ignore_list, the user is run through the rest of the chennel selection configuration without the ignore list being used, where, possibly, they might expect it to be used? Using --configure --makeignorelist ignore_list --useignorelist ignore_list works, of course, so I'm wondering if the answer is to use the ignore list immediately after it has been created or just make it clear to users that they should use both options together to get that functionality?

Another thing I've realised (although not related to these changes), is that warning 'No schedule found'; results in an exit status of one. This has been the behaviour of tv_grab_uk_tvguide for a long time (forever?). I do use the exit status myself to determine that all channel schedules have been retrieved, so the current behaviour is useful to me. For anybody getting schedules on channels that do not broadcast every day it's hard to know if a non-zero exit status would be ideal for them. Still, it has been this way for a long time and I have not seen anybody mention it as being an issue so it's not worth worrying about; it's just something that crossed my mind. Having just given a bit more thought to this, a workaround might be for folks to segregate the few non-daily channels into a separate config file so you can still depend on the exit status of zero for the daily channels and then I guess they could just merge the XML from each run (if they even needed to do that)."

It sounds like the third paragraph is redundant now, so that's good.

honir commented 1 year ago

Hi, No problem - thanks for your testing and feedback. I appreciate it.

Yes I wasn't sure whether --makeignorelist should have an implicit --useignorelist. But then I figured maybe some people would like to know what they would have got without the ignore_list? But I suppose they could just merge the two files (config_file + ignore_list)? Hmm... I'm not convinced.

I guess I wanted to make sure people were aware of the different actions (viz. 'make' and 'use') to highlight the fact you don't need to run the time-consuming 'make' every time you run configure (provided the file isn't too stale, etc.).

So any of these are valid: --configure --configure --makeignorelist ignore_list --useignorelist ignore_list --configure --useignorelist ignore_list --configure --makeignorelist ignore_list << pointless?

. (or to use a default filename...) --configure --makeignorelist --useignorelist

mkbloke commented 1 year ago

I guess with the few non-daily broadcasting channels, it's possible that users won't see them if they've been listed as those to ignore if the ignore list is applied straight after making the list, but it doesn't seem like there's much value in not using the ignore list, because the user is still left with channel duplicates by name, with, as far as I know no easy way to to differentiate them during the rest of the channel configuration process. The user will then be in the same position when the ignore list isn't used as they were before you implemented it, i.e., loads of duplicates and nothing to indicate which of those is the one that has the schedule data they might want - and there's over 700 of them to work through that are basically useless junk!

Perhaps a better answer - given the rest of the configure process is at least somewhat interactive - might be to simply prompt the user with something like Do you want to apply the newly created ignore list to the channel selection process now (y/n)? when --makeignorelist has been used either with or without the filename?

What do you think?

It's a bit of a PITA with the current state of TVG. I'm going to message them about it, but we shouldn't hold our breath for them to make the site more usable, of course. I did mention (about a year ago) the issue with the drop-down channel list. The message was read but not responded to and that's still broken as well...

honir commented 1 year ago

I'd be worried that adding new interactive options might break things for peeps using Myth/tvheadend/etc ?

honir commented 1 year ago

How do you decide what channels to put in your config-file? Do you just add all of them?

There are 22 differences in channel numbers between configure run last Thursday and today! So, using last week's config-file will miss 22 channels.

I'm wondering if people just: (1) add everything; (2) fetch everything (including all the empty channels); (3) ignore the channel number in the XML and just use the channel name to read the listings.

So if "Sky Sports Golf" moves from number 1586 to 1541 then you don't care provided the ch name hasn't changed?

Otherwise I can't see how people keep up with the changes. But even that wouldn't take into account all the new number changes. Do people run --configure every day?

This is doing my head in.

mkbloke commented 1 year ago

I'd be worried that adding new interactive options might break things for peeps using Myth/tvheadend/etc ?

Hmm, that's a good point. It would break my own scripted set-up!

There are 22 differences in channel numbers between configure run last Thursday and today! So, using last week's config-file will miss 22 channels.

Oh! That's more than I've ever seen.

I'm wondering if people just: (1) add everything; (2) fetch everything (including all the empty channels); (3) ignore the channel number in the XML and just use the channel name to read the listings. So if "Sky Sports Golf" moves from number 1586 to 1541 then you don't care provided the ch name hasn't changed?

I'm really not sure what most people would do. Given that there are lots of channels that a lot of folks are likely not interested in, I would have thought that starting with none and adding wanted channels back is the easiest/most time efficient way to do it.

I suppose this all really depends on how people are consuming the XMLTV data. If it's just being viewed then ID is not important, but in MythTV that ID is used internally and represents a specific channel listing.

How do you decide what channels to put in your config-file? Do you just add all of them?

I start with none enabled. I'm perhaps not the typical user. I have not had to run configure or change my lineups for a long time now. Part of my set-up stores the XMLTV IDs in another file (along with other info related to MythTV IPTV channel set-up). Those IDs are used in a script to directly enable them in the config file.

I really wish this issue would be fixed on the TVG website so the whole ignore list thing could go away, but that's probably not too likely. I did message TVG last night about it, but I'd be surprised if it's even responded to, let alone fixed.

honir commented 1 year ago

Well here's a pickle. I've been looking at the available channels for the past week.

I'm amazed that anyone can use this grabber at all. TVGuide shift the channel ids around on a daily basis. Here's the number of "id" changes per day:

7/2  27 changes from day before
8/2  25    "           "
9/2  33    "           "
10/2 23    "           "
12/2 24    "           "
13/2 26    "           "

There doesn't appear to be any rhyme or reason for these changes: often the change is reverted the next day!

e.g. ch id for E!

         6/2   7/2   8/2   9/2   10/2  12/2  13/2
E!       138  1829   138   138   138   138   138 
E! HD    831   831   831  1830   831   831  1830

It seems there's a 'bunch' of ids for a channel (e.g. 2 for E!; 16 for Sky Sports Golf) and the programme schedule for any given channel wobbles around amongst its ids.

It does seem to 'return' to a previous id though, so perhaps that is how people are making sense of it - they just don't notice that for a day, maybe two, they didn't get any new data on a given channel, but then it fixes itself the next day (when the id reverts, and data comes in back on the 'old' number)? (Let's call this scenario 1.)

Alternatively, (scenario 2,) people could just try and fetch all 1200 ids and ignore the ch id in their grabbed xml - so it doesn't matter whether the schedule is on 831.tvguide.co.uk or 1830.tvguide.co.uk (Perhaps using the tv_grab_uk_tvguide.map.conf facility to map the ch id to some fixed id: e.g. for Myth?)

So what does this mean? Well, it's certainly not going to be a good idea to make the 'ignorelist' the default. (It would break scenario 2.) It is, though, probably still just about worthwhile for those people working scenario 1.

mkbloke commented 1 year ago

Wow, that's quite a revelation. Well done for noticing that - excellent work.

I have not seen that behaviour myself, as it doesn't happen with any of the 23 channels that I'm getting data for.

Hmm.

honir commented 1 year ago

I've committed the changes for makeignorelist/useignorelist. Their use is optional.