XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
265 stars 92 forks source link

tv_grab_uk_tvguide can't configure #214

Open qdacsvx opened 10 months ago

qdacsvx commented 10 months ago

XMLTV Version?

(Please specify release version or git commit ID) V1.21 (Fedora repo)

XMLTV Component?

(Grabber name or utility) tv_grab_uk_tvguide

Perl Version

This is perl 5, version 36, subversion 0 (v5.36.0).

Operating System

Linux 6.4.13-100.fc37.x86_64

What happened?

Running "tv_grab_uk_tvguide --configure" produces an error.

$ tv_grab_uk_tvguide --configure tv_grab_uk_tvguide uses a cache with files that it has already downloaded. Please specify where the cache shall be stored. Directory to store the cache in: [/tmp/xmltv/cache] Fetching channels: 100% [======================================================================================================================================]No channels found in TVGuide Trying alternative method 1 Fetching channels: 0% [ ]HTTP error: 302 Found Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 1 of 5) HTTP error: 302 Found Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 2 of 5) HTTP error: 302 Found Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 3 of 5) HTTP error: 302 Found Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 4 of 5) HTTP error: 302 Found Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 5 of 5) HTTP error: 302 Found Can't call method "look_down" on an undefined value at /usr/bin/tv_grab_uk_tvguide line 932.

What did you expect to happen?

Configure runs normally.

The website tvguide seems to have updated. It may have different channel ids now.

Changed-Daily commented 10 months ago

I have experienced almost the same problem with V1.2.1 running on Windows:

C:\xmltv>tv_grab_uk_tvguide --configure Timezone is +0100

No channels found in TVGuide Trying alternative method 1 HTTP error: 301 Moved Permanently Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 1 of 5) HTTP error: 301 Moved Permanently Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 2 of 5) HTTP error: 301 Moved Permanently Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 3 of 5) HTTP error: 301 Moved Permanently Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 4 of 5) HTTP error: 301 Moved Permanently Retrying URL: https://www.tvguide.co.uk/mychannels.asp?gw=1242 (attempt 5 of 5) HTTP error: 301 Moved Permanently

garybuhrmaster commented 10 months ago

The website tvguide seems to have updated. It may have different channel ids now.

This may be related to issue #185 (ownership of the tvguide.co.uk business/website having changed in May of 2022).

Changed-Daily commented 10 months ago

It was the "HTTP error: 301 Moved Permanently" that made me think this may need some attention :)

honir commented 10 months ago

Are you able to edit the tv_grab_uk_tvguide script file?

After line 1408 my $ua = LWP::UserAgent->new; You could try adding $ua->requests_redirectable(['GET','HEAD','POST']);

See if that helps.

mkbloke commented 10 months ago

The whole site has been updated. Not sure how compatible (if at all), the changes will be with the existing grabber.

There's an API (whoop!) which spits out JSON, when called via, for example: https://api.tvguide.co.uk/schedules?start=2023-09-07T17:00:00.000Z&end=2023-09-07T22:00:00.000Z&type=grid&platformId=_popular&regionId=_popular. That JSON contains short programme descriptions only I think, so it may still be necessary to follow the links to programme pages to get the full descriptions.

I notice it's proxied via Cloudflare now (was it before, I hadn't noticed if it was?), which could be problematic in future depending on whether or not they enable the bot protection, but I think for now this is not a problem.

garybuhrmaster commented 10 months ago

There's an API (whoop!) which spits out JSON

Excellent. Presuming that the API is (mostly) stable, and is intended to be (and stay) publicly available, that should make things a lot easier going forward. Even better if the company has a spec available for developers.

The whole site has been updated. Not sure how compatible (if at all), the changes will be with the existing grabber.

Obviously a JSON ingester and a screen scrapper have little in common on the input side, even if the data acquired and output will have a lot on common. Whomever steps up to write a replacement may find enough salvageable and reusable code so that the git diff will not be a complete rip and replace (at least the POD is likely to have strong re-usability).

I notice it's proxied via Cloudflare now (was it before, I hadn't noticed if it was?)

I have not looked in quite some time, but suspect this is a somewhat recent change (I recall when I did look it had a large RTT to the site from my location in the US which correlated with it being hosted in the UK, and not just in the local Cloudflare DC)

steeevieee commented 10 months ago

So far I've worked out the following...

PLATFORM="Freeview"
REGION="London"
CHANNEL="Film4"

PLATFORM_ID=$(curl -sq "https://api.tvguide.co.uk/platforms" | jq --arg PL $PLATFORM -r '.[] | select(.title==$PL) | .id')
REGION_ID=$(curl -sq "https://api.tvguide.co.uk/regions" | jq --arg PID $PLATFORM_ID --arg RG $REGION -r '.[] | select(.platform_id==$PID) | select(.title==$RG)| .id')
CHANNEL_ID=$(curl -sq "https://api.tvguide.co.uk/channels" | jq -r '.[] | select(.title=="Film4") | .id')
SCHEDULES=$(curl -sq "https://api.tvguide.co.uk/schedules?start=2023-09-10T00:00:00.000Z&end=2023-09-10T23:59:59.999Z&platformId=${PLATFORM_ID}&regionId=${REGION_ID}" | jq --arg CH $CHANNEL_ID -r '.[] | select(.id==$CH) | .schedules')

...but that last JSON blob doesn't tell me things like the category. I've tried URIs like /programs /programmes /details but I can't find the route, and there's no swagger.

Any idea what the route might be ?

moob158 commented 10 months ago

does anyone have a solution?

garybuhrmaster commented 10 months ago

does anyone have a solution?

Not at this time.

If you want to help contact the tvguide.co.uk site (email, mail, or perhaps walk into their office) and request access to the developer API spec to share with this project.

misar1 commented 10 months ago

@garybuhrmaster Does that mean you think there is no way to update the previous grabber for the new website?

mkbloke commented 10 months ago

Does that mean you think there is no way to update the previous grabber for the new website?

It means nobody has done it yet. I'm going to look at putting somthing together this weekend, as I'm running out of guide data for my MythTV system.

mkbloke commented 9 months ago

I've been looking at this today. One thing I can't seem to find on the new site is the episode name. For example: Family Guy S21E5 is on later on ITV2, but I can't find its name 'Unzipped Code' via the new TVGuide website. In MythTV speak this is program.subtitle. This sucks. I've scanned through the JS that builds the API queries, but it doesn't seem to offer anything particularly helpful. It would be good if other people can take a look too, it's quite possible I've missed something.

I suspect the TVGuide API has probably only been written to do enough to support their needs and nothing more.

misar1 commented 9 months ago

Looking at the website programme (details) screen for each programme there is no sign that the episode name is displayed for any programme on any channel. If correct there is no way you can grab them.

spider3838 commented 9 months ago

If you want to help contact the tvguide.co.uk site (email, mail, or perhaps walk into their office) and request access to the developer API spec to share with this project.

Did anyone request the spec for the API? If not, I am willing to do so and share it.

mkbloke commented 9 months ago

I haven't and I suspect nobody else has either. I guess it can't hurt to try, although my feeling is that there probably isn't more to the API than can be seen currently. I could be wrong, of course and that could be a good thing.

spider3838 commented 9 months ago

I have submitted a request for the API spec and will provide an update here when I hear anything.

Update 19 September 2023: still haven't heard anything from my request for the API spec, not sure if we still need it?

spider3838 commented 9 months ago

I tried using the channel ID (in UUID format?) in the .conf file but the script is expecting an integer so that doesn't work. However, entering the URL into a browser in the form https://www.tvguide.co.uk/channel/2a548fcc-55e9-561d-9a77-f485fb69dad1/ (note that the trailing slash is important), it brings up the listings for that channel (defaulting to day zero). It also adds the channel name so that the URL appears as https://www.tvguide.co.uk/channel/2a548fcc-55e9-561d-9a77-f485fb69dad1/bbc-one-london/0 for the returned page. Is that a possible temporary workaround for the screen scraping, using the new format channels IDs rather than the old integer ones? Tedious to edit the config file but may only need to be carried out once.

I am competent in many computer languages but unfortunately Perl isn't one of them, so I'm not sure where to start with trying out this new approach in the code!

spider3838 commented 9 months ago

I have been doing some playing around and found that the channel listing from the URL https://www.tvguide.co.uk/channel/ looks to be in a completely different format to what the existing Perl is expecting (no surprise there probably). Attached is the HTML returned using curl to retrieve it (in a ZIP file, couldn't attach the HTML directly). The key part seems to be where the channel list starts, snippet below:

`

Channel List

    <div class="grid grid-cols-2 gap-4 md:grid-cols-4 md:gap-10">
      <a href="/channel/a4539d2a-1cee-52f5-a00e-d68835ce3e9f/4music/0">
        <div class=""><img src="https://tv.assets.pressassociation.io/37857809-8f80-5f1d-ba10-b7586c08799b.png" class="h-full w-full" width="320" height="180" /></div>
        <div class="my-1">4Music</div>
      </a>
      <a href="/channel/3d8bb828-1eb5-5de5-bb40-efba8bae9835/4seven/0">
        <div class=""><img src="https://tv.assets.pressassociation.io/279388ce-2262-5326-aad8-3acf0194e11b.png" class="h-full w-full" width="320" height="180" /></div>
        <div class="my-1">4seven</div>
      </a>
      <a href="/channel/d6ae0dba-9c18-5405-b222-264fe8e58aee/4seven-hd/0">
        <div class=""><img src="https://tv.assets.pressassociation.io/3e7bd0ba-3cf1-520a-97f1-bf53ddd2ab0c.png" class="h-full w-full" width="320" height="180" /></div>
        <div class="my-1">4seven HD</div>
      </a>

`

Apologies that the above is not pretty printed, not sure why that is but the full HTML file attached will be easier to read.

There no longer appears to be a separate "channel" indicator, it appears that it would have to be deduced from the href tag contents. The channel names seem to be easily identifiable via the <div class="my-1"> tag. The channel IDs could be extracted by looking for the "/channel/" pattern in the href tag. Don't know if any of that helps or if others have already found this but I had a few minutes to spare to carry out some digging.

pretty_channel_list.zip

steeevieee commented 9 months ago

Kinda got something working, but it's messy code, doesn't use any of the XML-TV library, and takes some work to set up. I'm reluctant to share it publicly (it's messy, there's always a chance that TVGuide don't want us to scrape, and I'm just too busy to maintain it), but will happily pass it on privately to someone else to work on.

It's also crashed once this morning reading a programme details page, but picked up again after....so I guess CloudFlare got in the way or my broadband dropped.

UPDATE: switched to LWP::UserAgent::Determined in case of these dropouts

spider3838 commented 9 months ago

I have been playing with the code for fetching the channels and the code below creates the attached .conf file.

`sub fetch_all_channel_ids {

Fetch all channel IDs with method 1, used for channel list creation and alternative ID searches

#
my $channels = {};
    # NCW - below change 2023-09-14
# my $tree = fetch_url('https://www.tvguide.co.uk/mychannels.asp?gw=1242', 'post', [
my $tree = fetch_url('https://www.tvguide.co.uk/channel/', 'get', [
    thisDay => '',
    thisTime => '',
    gridSpan => '',
    emailaddress => '',
    regionid => 1,
    systemid => 5,
    xn => 'Show me the channels'
]);

    # NCW below changed 2023-09-16
#my @c = $tree->look_down('_tag' => qr/table|tr/, 'class' => qr/^tr[XC]/);
my @c = $tree->look_down('_tag' => qr/div/, 'class' => qr/^my-1/);

my $j = 0 if $opt->{test};  # --test is an undocumented (private) option

foreach (@c) {
    my ($ch, $id, $l, $t);
            # NCW below added 2023-09-16
            #my $entry = $_;
            my $entry = $_->{'_parent'};

    my @chan_array = split("/", $entry->{'href'});
    # Above array now contains: "channel", <channel-ID>, <channel-name>, "0"
            # NCW_below changed 2023-09-16
    #if ($_->id =~ /^trX?\d+/) {
    if ($entry->{'href'} =~ /^\/channel\/*/) {
        # NCW below changed 2023-09-17
        #($id) = $_->id =~ /^trX?(\d+)/;
        ($id) = $chan_array[2];
                    # NCW changed below 2023-09-14
        #($ch) = $ROOT_URL.'channellistings.asp?ch='.$id;
        #($t) = $_->as_text;
        ($t) = $chan_array[3];
        ($ch) = $ROOT_URL.'channel/'.$id.'/'.$t;
        #($l) = $_->as_HTML =~ /background-image:url\(([^)]+)\)/;
        if ($entry->look_down('_tag' => qr/img/)) {
            ($l) = $entry->look_down('_tag' => qr/img/)->{'src'};
        } else {
            ($l) = "";
        }
    }

    $channels->{$id} = {id => $id . (!$opt->{'list-channels'}?"   # ".encode('utf-8', $t):(!$opt->{legacychannels}?'.tvguide.co.uk':'')),
                        'display-name' => [[ encode('utf-8', $t), 'en' ]],
                        icon => [{ 'src'=>$l }],
                        url => [ $ch ],
                        }
                        if $id;

    debug $id if $opt->{test};
    last if $opt->{test} and (++$j >= $opt->{test});  # limit during testing
}
return $channels;

}`

Apologies that the code is not appearing properly, I included it through the 'code' option but it doesn't seem to have applied it for the whole code snippet.

There are probably shorter, prettier ways of doing this but as I haven't learnt Perl, I will offer this to others to examine and incorporate properly if they wish.

The code for fetching the programme details will also need to be updated to incorporate the new channel IDs.

tv_grab_uk_tvguide_new.conf.zip

rmeden commented 9 months ago

BTW... someone should probably mention that non-profit schedulesdirect.org does have UK guide data.  It's not free, but is pretty cheap and is high quality via an API less likely to  break with upstream changes :)  There's a free 7 day trial that can hold you over until you can write a new scraper. (no CC info is requested, ever stored, and no auto renewal)

xmltv grabbers  tv_grab_zz_sdjson and tv_grab_zz_sdjson_sqlite can be used to get data for the UK (and other countries)

Disclaimer: I'm president of Schedules Direct and a founding board member.  SD was formed by the leaders of a number of open source projects when our free US/Canada data source went away.

Robert

honir commented 9 months ago

I've uploaded a new version of this grabber for beta testing.

You will need to create a new config-file with --configure

Programme categories are being elusive at the mo. They are available via a webpage call for every prog in the schedule, but I'd rather not do that if possible.

Let me know how it goes.

misar1 commented 9 months ago

@honir Thanks for this.

I created a .conf file (Freeview, London, All channels) without any problem (very impressed with your handling of the options!) but get an error when I grab a listing.

For convenience I always run xmltv from a batch file like this test one: @echo off D: cd D:\XMLTV set XMLTV_SUPPLEMENT=\XMLTV.xmltv\supplement set TEMP=\XMLTV\ set HOME=\XMLTV\ echo. XMLTV.exe tv_grab_uk_tvguide --config-file D:\XMLTV.xmltv\tv_grab_uk_tvguide.conf --days 3 --output D:\XMLTV\Test3dayNewEPG.xml echo.

But the result is this: Timezone is +0100 Fetching listings: 0% [ ]Argument " " isn't numeric in numeric gt (>) at \XMLTV\par-6d696b6573\cache-2fcf189c5e125e3ed42d1163af5687b2e53fd2f0\inc/script/tv_grab_uk_tvguide line 380. Argument " " isn't numeric in numeric gt (>) at \XMLTV\par-6d696b6573\cache-2fcf189c5e125e3ed42d1163af5687b2e53fd2f0\inc/script/tv_grab_uk_tvguide line 380.

Line 380 is in this group:

re-base the series/episode/part numbers

    $s-- if (defined $s && $s > 0);
    $e-- if (defined $e && $e > 0);
    $p-- if (defined $p && $p && $p=~/^\d+$/ && $p > 0);

I tried Sky instead just in case there is something odd with the Freeview series/episode/part numbers but got the same error.

Any idea what I am doing wrong?

PS D:\XMLTV.xmltv in the batch file has a \ in front of .xmltv when I paste it in but it disappears from the listing above.

honir commented 9 months ago

Thanks @misar1 I've added some error trapping to catch duff values in the incoming series/episode numbers.

misar1 commented 9 months ago

@honir Unfortunately you removed the error message but the result is the same: Timezone is +0100 Fetching listings: 0% [ ]

I noticed that it always writes a single cache entry (attached) for BBC One London - even after I deleted that channel from the .conf list. That seems to be the problem because it is correctly generating the XML for that channel.

07cd930ed011bc60ff68e19f2f2eccb2.zip

misar1 commented 9 months ago

My apologies @honir your script is working correctly.

I was confused because the command I posted above finished in a few seconds and produced only a single entry in the cache. However, it generated an XML for 3 days with all the Freeview channels except BBC One London which as mentioned previously I deleted from the .conf channel list. I assume the change from grabbing the old EPG is because it now omits the detailed programme descriptions, categories, etc.

Thanks again for producing the new script so quickly.

spider3838 commented 9 months ago

@honir thank you so much for fixing this! I run a Java programme daily on a Raspberry Pi to mimic the defunct Sky Never Miss service and this grabber is invaluable. I had to modify my Java code slightly to accommodate the season and episode data changes but it is all working now.

I would have liked the Freeview and Sky channels all in one config file, does that look like it might be possible (I could raise a new issue for this enhancement) or will it not be possible due to the way that tvguide.co.uk splits everything by platform?

honir commented 9 months ago

Yes, it would have to be two separate runs. But you can join or merge the two files together before feeding your RPi.

tv_cat

Read one or more XMLTV files and write a file to standard output whose
programmes are the concatenation of the programmes in the input files,
and whose channels are the union of the channels in the input files.

tv_merge

Read XMLTV listings from two files and merge them together.
Unlike tv_cat (which just joins files) this will update (add/replace/delete)
the original XMLTV file with channels and programmes contained in the second
file.
It works with multiple channels, and will insert any new programmes
and delete any overlapping programmes.
spider3838 commented 9 months ago

@honir, thanks for the swift reply, I shall investigate!

mkbloke commented 9 months ago
$showdesc .= '.' if ( (length $showdesc) && ((substr $showdesc,-1,1) ne '.') ); # append a fullstop

I also added the fullstops back in my implementation, which I abandoned, but one issue with this is that it'll add fullstops to lines ending with a question or exclaimation mark.

I used:

$prog_detail = $1 if ($prog_detail and $prog_detail =~ /(.*)[\s\t\r\n]+$/);
$prog_detail .= '.' if ($prog_detail and $prog_detail =~ /[^\.\?!]$/);

You might not need the first line, that comes from another grabber I'm working on. It seems (I think) that the XML writer trims trailing white space (and leading?), but that's not helpful if you dump another non-whitespace character after it. :-)

honir commented 9 months ago

Thanks @mkbloke Yes the writer will trim leading and trailing whitespace. I'll try it without the local trim while we see what the incoming data look like. (I think 'description' is missing if it's blank rather than null or space, but...)

I've updated the script with your code ;)

spider3838 commented 9 months ago

Just tried the latest version of the Perl script and got this:

nevermiss@Nickbian:~ $ tv_grab_uk_tvguide --config-file xmltv/tv_grab_uk_tvguide.conf --output xmltv/xmltv.xml --days 1 --debug Fetching: https://api.tvguide.co.uk/schedules?start=2023-09-20T00%3A00%3A00.000Z&end=2023-09-21T00%3A00%3A00.000Z&type=grid&platformId=d3b26caa-8c7d-5f97-9eff-40fcf1a6f8d3&regionId=8bf03071-074a-505b-a64f-9e4a1fae36be Fetching https://api.tvguide.co.uk/schedules?start=2023-09-20T00%3A00%3A00.000Z&end=2023-09-21T00%3A00%3A00.000Z&type=grid&platformId=d3b26caa-8c7d-5f97-9eff-40fcf1a6f8d3&regionId=8bf03071-074a-505b-a64f-9e4a1fae36be from server. could not fetch https://api.tvguide.co.uk/schedules?start=2023-09-20T00%3A00%3A00.000Z&end=2023-09-21T00%3A00%3A00.000Z&type=grid&platformId=d3b26caa-8c7d-5f97-9eff-40fcf1a6f8d3&regionId=8bf03071-074a-505b-a64f-9e4a1fae36be, error: 403 Forbidden, aborting nevermiss@Nickbian:~ $

Don't know if it is CloudFlare, tvguide.co.uk tightening access to the API or a temporary glitch.

Anyone else seeing this?

honir commented 9 months ago

Yep, looks like they don't want us to play. They are blocking access from XMLTV.

misar1 commented 9 months ago

That's a shame but thanks for trying. I guess that converting the script to grab the new site using the original approach is even more complicated?

honir commented 9 months ago

It's do-able, but they'd probably just block us there as well.

Maybe you could persuade them to allow access for XMLTV - but I doubt it. The new owners don't seem to have their heart set on providing quality guide data. (And their app is broken and hasn't been updated in over 3 years.)

honir commented 9 months ago

I'm happy to recommend the guide service from Schedules Direct (SD).

I've been using them myself for over 5 years and, apart from a few niggles -- such as them working on the server overnight in Texas time, which is of course 6-10 a.m. in the UK, and means my daily automated fetch fails and I have to remember to try again later -- the service has been excellent. Fault reporting could be a bit more transparent (you only get to see your own reports rather than a public fault log) but it's rarely needed.

The data are generally good, and have pretty much all UK channels. The data provider (Gracenote) are much improved from their early Tribune Media Services (TMS) days in the 00s.

You can use either of the project's approved grabbers: zz_sdjson or zz_sdjson_sqlite

The dollar exchange rate has bumped up the cost in recent years, but at £28 a year it's good value (54p a week).

Disclosure: I have no affiliation (or commission!) from SD, although I do know the people who run the organisation (as they include one of the elders of this XMLTV project). But my comments reflect the quality, and cost, of the data rather than any personal issues.

misar1 commented 9 months ago

A major benefit of the old EPG (but not the current version) was the detailed information, including very long programme descriptions, categories, ratings, etc. Does SD provide any of that for the UK channels?

honir commented 9 months ago

Absolutely. e.g. Film4

<programme start="20230922142000 +0000" stop="20230922173000 +0000" channel="21494">
    <title>The Guns of Navarone</title>
    <desc>A small group of commandos, led by Captain Keith Mallory, are assigned to attack 
and destroy the eponymous guns - installed on the side of a mountain - that are preventing 
allied ships from rescuing 2,000 trapped British soldiers. Unable to attack by air or sea, the 
crack team is sent in but it seems there is a turncoat amongst them.</desc>
    <credits>
      <director>J. Lee Thompson</director>
      <actor>Gregory Peck</actor>
      <actor>David Niven</actor>
      <actor>Anthony Quinn</actor>
      <actor>Stanley Baker</actor>
      <actor>Anthony Quayle</actor>
      <actor>James Darren</actor>
      <actor>Irene Papas</actor>
      <actor>Gia Scala</actor>
      <actor>James Robertson Justice</actor>
      <actor>Richard Harris</actor>
      <writer>Alistair MacLean</writer>
      <writer>Carl Foreman</writer>
      <producer>Leon Becker</producer>
      <producer>Cecil F. Ford</producer>
      <producer>Carl Foreman</producer>
    </credits>
    <date>1961</date>
    <category>War</category>
    <category>Drama</category>
    <category>Feature Film</category>
    <category>movie</category>
    <length units="minutes">157</length>
    <episode-num system="dd_progid">MV002829430000</episode-num>
    <rating system="Régie du cinéma">
      <value>G</value>
    </rating>
    <rating system="UK Content Provider">
      <value>PG</value>
    </rating>
    <rating system="Departamento de Justiça, Classificação, Títulos e Qualificação">
      <value>12</value>
    </rating>
    <rating system="Ontario Film Authority">
      <value>PG</value>
    </rating>
    <rating system="British Board of Film Classification">
      <value>PG</value>
    </rating>
    <rating system="Arcom">
      <value>Tous publics</value>
    </rating>
    <rating system="Statens medieråd">
      <value>Från 15 år</value>
    </rating>
    <rating system="Australian Classification Board">
      <value>PG</value>
    </rating>
    <star-rating system="Gracenote">
      <value>3.5/4</value>
    </star-rating>
</programme>

Or E4 example

<programme start="20230924155000 +0000" stop="20230924161500 +0000" channel="25117">
    <title>The Big Bang Theory</title>
    <sub-title>The Relationship Diremption</sub-title>
    <desc>Sheldon faces a personal crisis when he starts to wonder whether he's wasting his time 
with his study of String Theory, although Penny does her best to offer him advice.</desc>
    <credits>
      <director>Mark Cendrowski</director>
      <actor>Johnny Galecki</actor>
      <actor>Jim Parsons</actor>
      <actor>Kaley Cuoco-Sweeting</actor>
      <actor>Simon Helberg</actor>
      <actor>Kunal Nayyar</actor>
      <actor>Mayim Bialik</actor>
      <actor>Melissa Rauch</actor>
      <actor>Stephen Hawking</actor>
      <writer>Steven Molaro</writer>
      <writer>Bill Prady</writer>
      <writer>Jim Reynolds</writer>
      <writer>Chuck Lorre</writer>
      <writer>Eric Kaplan</writer>
      <writer>Steve Holland</writer>
      <producer>Chuck Lorre</producer>
      <producer>Bill Prady</producer>
      <producer>Steven Molaro</producer>
      <guest>John Ross Bowie</guest>
      <guest>Laura Spencer</guest>
    </credits>
    <category>Sitcom</category>
    <category>Series</category>
    <category>series</category>
    <length units="minutes">21</length>
    <episode-num system="xmltv_ns">6.19.</episode-num>
    <episode-num system="dd_progid">EP012708730158</episode-num>
    <audio>
      <stereo>stereo</stereo>
    </audio>
    <previously-shown start="20140410"/>
    <subtitles type="teletext"/>
    <rating system="USA Parental Rating">
      <value>TVPG</value>
    </rating>
    <rating system="Freiwillige Selbstkontrolle Fernsehen">
      <value>12</value>
    </rating>
    <rating system="Régie du cinéma">
      <value>13+</value>
    </rating>
    <rating system="Canadian Parental Rating">
      <value>PG</value>
    </rating>
    <rating system="Mediakasvatus- ja kuvaohjelmayksikkö">
      <value>S</value>
    </rating>
    <rating system="Australian Classification Board">
      <value>PG</value>
    </rating>
    <rating system="Kijkwijzer">
      <value>AL</value>
    </rating>
</programme>

I believe the data come straight from the TV networks (Gracenote have contracts for supply with most of them (excl. Virgin?))

They also provide unique identifiers which helps with your database! e.g. see dd_progid=MV002829430000 (MV=movie) and dd_progid">EP012708730158 (EP=episode).

I think they offer a 7-days free trial, so you can have a look without cost. It's worth a look, I think.

garybuhrmaster commented 9 months ago

The new owners don't seem to have their heart set on providing quality guide data.

Quality guide data costs a fair amount of money to acquired and curate. With the exception of locations for which the regulatory agencies require broadcasters to offer such guide data for free and someone else (the broadcaster) pays the organizations, those organizations expect to be paid for their work(s) by the consumers of that guide data. In some cases that means limiting access to their own customers (placing the guide data behind a subscriber portal), or monetizing the guide data by providing the content on a web page filled with ads with the expectation (and typically the requirement) that you must consume (see) the ads (which is sometimes seen as the "no screen scraping allowed" TOS).

Of course, in the longer run, while the underlying data about the show/movie/event will still be monetizable (because people will want to search out available romcom movies, for example), many people are choosing to consume content in different ways than linear TV (via various on-demand/streaming services), so linear TV guide data likely has a declining revenue model, and that almost insures that investment is also declining for such linear TV guide data.

garybuhrmaster commented 9 months ago

I'm happy to recommend the guide service from Schedules Direct (SD).

I too am a satisfied customer of Schedules Direct, and have been since its inception (2007) when the feed from TMS (zap2it labs) ended in the US. I liked it's data so much I authored one of the XMLTV grabbers using their new API partially because the PVR I use could migrate to a pure XMLTV guide data loading, and partially because the guide data available via the new API was richer than the previous data. Having guide data that just works has had value to me.

Fault reporting could be a bit more transparent (you only get to see your own reports rather than a public fault log) but it's rarely needed.

For scheduled activities, and site wide faults, Schedules Direct's forum provides a more public view and can be set to notify you (for those that wish to share; Schedules Direct clearly should not share your issues publicly without your permission, but for larger unplanned issues one tends to see a bunch of people posting).

The data are generally good, and have pretty much all UK channels. The data provider (Gracenote) are much improved from their early Tribune Media Services (TMS) days in the 00s.

Gracenote at some point decided to leverage their extensive data about content for worldwide availability. As I understand it they partner with local organizations where possible for the local schedule data. Sometimes partners are not available (not all countries are supported), and sometimes local partners quality of data is poor (and countries disappear), but as I understand it for most of the European countries the data is high quality, and they have processes to combine local showings with their existing rich content so that one tends to get high quality data. The underlying data is arguably richer (in at least some cases) than what XMLTV grabbers can make available due to the XMLTV schema.

The dollar exchange rate has bumped up the cost in recent years, but at £28 a year it's good value (54p a week).

The important issue is value. I clearly feel that I receive value for my yearly fee (which, being in the US, currently means $35/yr). I understand some will not see value in guide data "that just works" at any price, or because that price really exceeds what they can afford.

Disclosure: I have no affiliation (or commission!) from SD, although I do know the people who run the organisation (as they include one of the elders of this XMLTV project). But my comments reflect the quality, and cost, of the data rather than any personal issues.

FD: I also have no affiliation with Schedules Direct itself. As the developer of one of the XMLTV grabbers I do occasionally work with Schedule Direct staff more directly due to an unusual issue or occurrence, and if requested will make sure my app still works with some planned change in their services (Schedules Direct staff will sometimes contact "the usual suspects" (i.e. other API developers) to let them know about upcoming changes which they believe should be transparent, but testing is always better; I certainly appreciate the heads-up).

spider3838 commented 9 months ago

I'm not sure exactly how tv_imdb works but could it be used to provide the missing programme details?

EDIT: ignore this, I have now tried it and the data is either very old or non existent.

spider3838 commented 9 months ago

By changing the User Agent details for XMLTV::Get_nice just before it is called, I have managed to get the grabber to work. The default XMLTV User Agent is "xmltv/$XMLTV::VERSION" which probably appears (quite correctly) to the TVGuide API as a bot. By adding a line to change the User Agent before the API is called, to make it look like a normal browser (in this case Chrome on my Raspberry Pi), the request was accepted and the programme data retrieved.

The line I added to the fetch_listings subroutine was:

$XMLTV::Get_nice::ua->agent("Mozilla/5.0 (X11; CrOS armv7l 13597.84.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.98 Safari/537.36");

just before:

my $data = XMLTV::Get_nice::get_nice_json($url, undef, 1);

Does that help? It may need updating from time to time if their API sees the "browser" as incompatible e.g. due to age.

Does this work for others and if so, is this a permanent fix (at least for now)?

misar1 commented 9 months ago

Worked for me too. Well done!!

mkbloke commented 9 months ago

By changing the User Agent details for XMLTV::Get_nice just before it is called, I have managed to get the grabber to work.

My previous testing seemed to indicate than any user agent string containing the substring xmltv/ was blocked.

It is a solution for personal use. I'm not sure the XMLTV project will want to officially support a grabber that evades a site's blocking though.

spider3838 commented 9 months ago

Fair point, perhaps the xmltv/ substring could be added as well but that might still be seen as evading robot blocking?

I also thought after my post that a similar change is probably needed during the configure phase.

garybuhrmaster commented 9 months ago

It is a solution for personal use. I'm not sure the XMLTV project will want to officially support a grabber that evades a site's blocking though.

I am not a decider, but the project has in the past decided not to facilitate circumvention of a site's choices for access/blocking, as the project wants to be considered a good net citizen.

If you believe that the site should allow xmltv to access the content, you should ask them what the project needs to do to be considered an acceptable user of their data (sometimes that can be to increase the time between requests, or to use a different endpoint). If they do not offer, or be willing to discuss, a path forward, there is likely no viable path forward for the xmltv project.

misar1 commented 9 months ago

I am not a decider, but the project has in the past decided not to facilitate circumvention of a site's choices for access/blocking, as the project wants to be considered a good net citizen. >

No website likes scraping and would prefer to stop it. Most don't because its not possible or too much trouble. In any case if the scraping is solely for personal use, websites which make their data openly available (i.e. not behind a paywall) cannot use legal or copyright rules to control how the individual user chooses to consume them.

Up to now all TVG websites targeted by your scripts have fallen into the "most don't" category. spider3838 is probably right that they detected the new script as a bot and simply attempt to block all bots. If you really believe the "good net citizen" line, XMLTV should write to EVERY site it targets and ask if they mind you grabbing their TVG. I doubt you will get a welcome from any of them.

rmeden commented 9 months ago

If you really believe the "good net citizen" line, XMLTV should write to EVERY site it targets and ask if they mind you grabbing their TVG. I doubt you will get a welcome from any of them.

Devs are supposed to check the site's terms of service to see if there is anything against scraping before a grabber is added to the project. If the TOS changes, we don't periodically recheck it. With the agent string, we're upfront about who we are. If they want to block us, we don't fight it, or start an arms race.

I don't think we need to get permission from every site in advance.... as you can see from this thread, getting hold of a human is difficult.

steeevieee commented 7 months ago

BTW... someone should probably mention that non-profit schedulesdirect.org does have UK guide data.

Gave it a try this weekend and it's really pretty good.

The configure process needs some work...hangs for 3-4 mins at the postcode stage then only returns a handful of channels of the UK....so I got it to pull all channels instead then disabled them all and used sqlite to enable the 50 I wanted. Also opened support ticket 22778 because of a lineup issue, which will hopefully be fixed before my trial expires.

garybuhrmaster commented 5 months ago

Until/unless someone provides a working grabber for tv_grab_uk_tvguide, it has been removed from the builds.