XMLTV / xmltv

Utilities to obtain, generate, and post-process TV listings data in XMLTV format
GNU General Public License v2.0
301 stars 94 forks source link

tv_imdb: update to use new data source #17

Open knowledgejunkie opened 6 years ago

knowledgejunkie commented 6 years ago

In late December 2017, the IMDB mirror went read-only; the URL of the archived data changed to ftp://ftp.fu-berlin.de/pub/misc/movies/database/frozendata/.

From the README:

"IMDb datasets, providing bulk-access to IMDb title and name data, are now available from us via an HTTPS link.

https://datasets.imdbws.com/

As a previous ftp user you can just switch to https, however there are some formatting changes within the data.

For details on the new file formats and access guidelines, see www.imdb.com/interfaces."

In addition to being served over https, the data files on IMDB's new service have some formatting changes.

knowledgejunkie commented 6 years ago

Updated base URL in IMDB.pm to use archived data (December 2017) in https://github.com/XMLTV/xmltv/commit/f0140f37050a915078026fc6ad81f7320d2e5217

This is only a short-term workaround until we migrate tv_imdb to the new data source.

jnylen commented 6 years ago

Also, One way is using omdbapi and let the user add the APIKey.

honir commented 4 years ago

Sadly, it looks like tv_imdb -- in its current form -- is a dead duck. The IMDb dataset is no longer in the public domain.

The ftp files haven't been updated since Dec 22 2017, and won't be updated in future. IMDb are no longer releasing updates to these files.

The expectation from IMDb is that people switch to using the new TSV (tab separated values) files available on https. However these files contain a very much reduced dataset, and many key elements are no longer available (no plot summaries, mpaa ratings, keywords, only 3 genres, only the top 3 actors, etc.).

The ethos as stated by IMDb is:

"The sets of data we provide are updated to only include the essential ones that help with matching and linking to an IMDb title or name."

In other words, the intention of the new datasets is that you are only to use them to identify the key to access the page on their website, and no more. Hence no rich dataset like we've used for the past 20-odd years. The marketing reasons for this should be obvious if you've visited imdb.com lately: it's like the old days with auto-playing videos, clickbait, massive adverts, etc.

I think we need to look at alternatives, such as the APIs from TMDb (The Movie Database) or OMDb (The Open Movie Database).

honir commented 4 years ago

It looks like OMDb is no longer maintained. It uses IMDb data and reading some of the support tickets suggests it probably uses a database built from the no-longer maintained .list files (hint= people can't find programmes shown after 2017)

And TMDb has 567,000 films compared to 6,500,000 on IMDb :-(