Komet / MediaElch

Media Manager for Kodi
https://mediaelch.github.io/mediaelch-doc/about.html
GNU Lesser General Public License v3.0
812 stars 93 forks source link

[Suggestion] Add scraper for the MediathekView program list #612

Open noyannus opened 5 years ago

noyannus commented 5 years ago

For German-speaking users it would be a great enhancement if MediaElch could include a scraper for the Mediathek services of public German TV stations.

A unified list ("Filmliste") of their programs is maintained by the makers of MediathekView. They allow use in private projects (https://github.com/mediathekview/MediathekView/wiki -- page bottom). They also run a www service (https://mediathekviewweb.de/).

I suggest contacting them about the possibility to use it for MediaElch. Probably it will be more convenient to use an API to their web service than to download and keep updates of a huge Filmliste file.

Why this is an enhancement: Too many Movies in the currently scraped services have poor or no German descriptions. For every Film broadcast by the stations covered by MediathekView, a good description should be available in the MedithekView Filmliste.

bugwelle commented 5 years ago

Hi,

thanks for this feature request! :smiley:

I'm German as well and like this idea. Nevertheless: Would this really be such a huge improvement? As far as I can tell, movies and shows are only available for a limited time period. Re-scraping old movies would result in failures if I'm not mistaken.

Regards, Andre

noyannus commented 5 years ago

The programs are available for a limited time only for download, but their info ghosts haunt the net much longer. Example: https://programm.ard.de/TV/mdrfernsehen/sensationsprozess-casilla/eid_28229923006352

Programs that were (co-)produced by the public TVs can be available for long times, possibly without limitation.

Still you are right: once the download has been 'depublicised', the info, too, is no longer available in MediathekView[Web]; it is a downloading program.

That means either for the user to be quick and run MediaElch during the download period.

Or MediaElch would have to provide a cumulating database to query with the scraper. An automatically updated program list, available on github, gitlab or wherever, could do the trick.

I am not absolutely sure about the copyright situation regarding the infos, but see https://de.wikipedia.org/wiki/Sammelwerk#Sammelwerk_im_Urheberrecht . Programming and setting up the DB should count as substantial 'investment' (but IANAL!). As MediathekView's list already is a legal collection, and their web service is based on a DB, only the cumulating feature would be different and I don't see why that would make the DB illegal. ( ...Unless MediathekView needs and has a special permission to crawl and publish exactly in the way thy currently do -- but then, why should MediaElch not get some similar permission? They'd do free advertising and improve audience experience!)

As for failures on missing infos: can they be treated just like missing records in TMDB, IMDB, etc.?

EDIT: And the great advantage would be that MediaElch would be able to add info to all the documentaries, local programs, series, news shows, whathaveyou, that never make it into the international movie databases. Whatever the crazy collector wants -- they could have it!

bugwelle commented 4 years ago

Hi,

I won't set up a a service that accumulates the information of ard.de and similar. It's not that I don't like the idea but I neither have the resources nor the time for that (it's been over a year since my last reply, sorry about that).

Thank you for the link to programm.ard.de. I'm surprised but I can actually search for shows from 2017: https://programm.ard.de/TV/Programm/Detailsuche?detailsuche=1&sendungstitel=Tagesschau&mitwirkende=&volltext=&ausstrahlungswahl=period&uhrzeitStart=27.12.2017&uhrzeitEnde=29.12.2017&sendezeitauswahl=none&senderauswahl=all&sort=auto

But the search only lists a few days (or 200 results at most).

I could add an ard.de scraper but that would be very limited in that the user would have to give a certain date(-range). It would also only work for movies and not shows. :-)