XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
269 stars 93 forks source link

tvguide.co.uk download speed #158

Closed misar1 closed 2 years ago

misar1 commented 2 years ago

This is a follow up to my Post #155 as that is closed with "wontfix".

Further to my previous comment, I found that the --nodetailspage option gives an almost identical .xml file while reducing the download time of my channels from 90 minutes to 5 minutes. I say almost because it omits the stop= field which deviates from the full XMLTV standard and messes up my custom EPG program. I understand that it is generally the same as start= of the next programme but that does not apply to the last programme of the day and coding to put it back in the downloaded xml is not trivial.

Is there any chance of modifying the option or adding another one?

I have no idea why the option saves so much download time but with that minor change the script would be perfect!

Thanks.

honir commented 2 years ago

As a general rule, a grabber shouldn't add anything that isn't in the original data. The TVGuide schedule doesn't include stop times so these aren't added.

However, the "details" addition fetches additional web pages and does include stop times. Hence the stop time is only added if you do the details fetch. It does, however, incur a considerable overhead as you've noticed: hence the grabber has the --nodetailspage option, to avoid that overhead, if it suits your use case.

Once upon a time, the details page contained extra detail - other than stop times - such as a more complete cast list, film director names, BBFC film classification. But those data seem to have disappeared from the website. .

You should note that while programme 'start time' is mandatory, 'stop time' is actually optional in the XMLTV DTD.

However there may be a way you can add stop times, by using the tv_sort filter.

If you feed your xml (grabbed with --nodetailsfetch) through tv_sort then you should see stop times added (with the obvious exception of the last programme in the file).

Something like: tv_grab_uk_tvguide --days 1 --nodetailspage | tv_sort --by-channel --output uk_tvguide.xml

misar1 commented 2 years ago

Your command line worked exactly as you described except it loses the stop time of the last programme on each channel rather than the last programme in the file. However, I can work around that.

Many thanks for the quick response and for a great script. I now have no complaint about the download speed!

honir commented 2 years ago

Cool. Oops, yeah that's what I meant ;-) (my test file had only one channel!)

I'm glad it's working for you now. It was a good spot on your part to try the --nodetailspage option.

It's a shame the extra data (better actor credits) are no longer available, but that's beyond our control, unfortunately.

Thanks.