BurntSushi / goim

Goim is a robust command line utility to maintain and query the Internet Movie Database (IMDb).
The Unlicense
117 stars 9 forks source link

temporal data #2

Open BurntSushi opened 10 years ago

BurntSushi commented 10 years ago

Try to see if temporal data can be added to the Goim database. It appears that temporal data is available for some categories via the diffs that IMDb releases.

This will complicate some parts of the schema and the code that queries it, but I don't expect it to be overwhelming. Notably, this would apply to categories where temporal data is particularly useful, like the ratings list (which stores IMDb movie ranks and votes). This would not include things like movies. I don't think we care if a certain movie or episode was in IMDb at one point and not another. Plus, we need to be reserved about temporal data for movies/episodes/actors/etc, since it would explode the storage requirements.

Note that the presence of diffs may seem like it'd be a good idea to use them to update the database too, since it requires a lot less downloading. But I'm not sure it's possible for all lists. Notably, I don't think it's reasonably possible for the biggest lists like actors and actresses without some serious legwork. These lists require some context to specify which credits belong to actors, and diffs don't provide it. I think the only way to get it would be to start with a master copy, establish context and apply the diffs in such a way as to preserve context.