Add Generic (and IMDB) DataProviders - Githubissues

damienhaynes / moving-pictures

Moving Pictures is a movies plug-in for the MediaPortal media center application. The goal of the plug-in is to create a very focused and refined experience that requires minimal user interaction. The plug-in emphasizes usability and ease of use in managing a movie collection consisting of ripped DVDs, and movies reencoded in common video formats supported by MediaPortal.

12 stars 6 forks source link

Add Generic (and IMDB) DataProviders #9

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Create a data provider that scrapes data from IMDB. Also add support for
the user to select their preferred data provider for meta data and cover
art (should be able to pick separately).

Original issue reported on code.google.com by conrad.john on 26 Apr 2008 at 6:59

GoogleCodeExporter commented 9 years ago

Original comment by conrad.john on 25 Jul 2008 at 4:22

GoogleCodeExporter commented 9 years ago

For the IMDB dataprovider, we should consider creating a more general scraping
dataprovider that takes a config file (one time only, with config later stored 
in the
database) as input. You could conceivably create a generic scraper that only 
needed a
few regex strings to customize it to specific sites.

Original comment by conrad.john on 4 Aug 2008 at 2:40

GoogleCodeExporter commented 9 years ago

Good info and decent discussion about an existing system used by XBMC that is 
very
similar to what I was suggesting above:

http://www.meedios.com/forum/viewtopic.php?p=30418

We should review what XMBC has done to see if we can build off their stuff and 
cruise
past the mistakes they may have ran into along the way.

Original comment by conrad.john on 5 Aug 2008 at 1:35

GoogleCodeExporter commented 9 years ago

A generic data provider is essential for multi-lingual support.

Original comment by conrad.john on 6 Aug 2008 at 4:14

Changed title: Add Generic (and IMDB) DataProvider
Added labels: Milestone-0.6.5
Removed labels: Milestone-0.6

GoogleCodeExporter commented 9 years ago

Original comment by conrad.john on 6 Aug 2008 at 4:23

Added labels: Milestone-0.7
Removed labels: Milestone-0.6.5

GoogleCodeExporter commented 9 years ago

This should apply to cover art as well. Should probably at least supply from
impawards.com.

Original comment by conrad.john on 22 Aug 2008 at 1:52

Changed title: Add Generic (and IMDB) DataProviders

GoogleCodeExporter commented 9 years ago

Checkout my suggestion to use XBMC's XML scrapers for HTTP scraping
http://forum.team-mediaportal.com/improvement-suggestions-46/suggestion-use-xbmc
s-xml-scrapers-http-scraping-35312/
http://www.meedios.com/forum/viewtopic.php?p=30418

XBMC scrapers already does everything of what users are asking for here.

Original comment by gamester...@gtempaccount.com on 30 Aug 2008 at 3:43

GoogleCodeExporter commented 9 years ago

Will strongly consider using XBMC's system. Will give it a look in the coming 
week.

Original comment by conrad.john on 1 Sep 2008 at 9:03

Added labels: Milestone-0.5.4
Removed labels: Milestone-0.7

GoogleCodeExporter commented 9 years ago

Original comment by conrad.john on 1 Sep 2008 at 9:03

GoogleCodeExporter commented 9 years ago

One great function for XBMC scraper is to save local cache together with the 
movie 
in NFO file.
It's really a great concept, giving great flexibility of usage. 
They even have application to scan the shares and create those NFO files.
Look at: http://xbmc.org/forum/showthread.php?t=33961

Original comment by pira...@gmail.com on 1 Sep 2008 at 9:10

GoogleCodeExporter commented 9 years ago

Original comment by conrad.john on 2 Sep 2008 at 8:04

Added labels: Milestone-0.5.5

GoogleCodeExporter commented 9 years ago

I thing unfortunately I wont be able to use XBMC's scraper system. :( The 
output of 
the search function is only a text based string and a url for the details page 
for 
the movie. The string is not even consistent, sometimes it is title, sometimes 
it is 
title and year, sometimes it is official title, english title, then year. It's 
just 
not reliable. It is not designed to work with an automated system, using their 
scripting engine would effectively break the auto-approval feature of Moving 
Pictures. I am looking at other options but I think I will probably do a new 
system 
based on XBMC's ideas.

Original comment by conrad.john on 4 Sep 2008 at 3:46

Changed state: Started

GoogleCodeExporter commented 9 years ago

Are you sure?
I just tried with that sample program scrap.exe and attached are results of 
search 
for "finding nemo".
Seems pretty streightforward: results.xml gives you matches for search, and 
details.xml gives you details of first match.

Original comment by pira...@gmail.com on 4 Sep 2008 at 4:23

Attachments:

[finding nemo.zip](https://storage.googleapis.com/google-code-attachments/moving-pictures/issue-9/comment-13/finding nemo.zip)

GoogleCodeExporter commented 9 years ago

Yep, see the "title" field in the results.xml file? In this situation the 
person who 
implemented the skin chose to put the title and year. To begin with, this is an 
XML 
document describing data. It should not be handling presentation (like how to 
format the 
title/year for the user) at this level. But setting that aside, the data that 
is in this 
field is inconsistent from script to script. It all just depends on what the 
person implementing the script chose to put there. So this eliminates the 
possibility to even 
consistently parse useful information from there. Take a look at a few other 
scripts, 
foreign language scripts in particular, and you will see.

Trust me I am disapointed too, the thought of instantly having dozens of data 
providers 
available would be great, but unfortunately their setup just does not work with 
the way 
our importer does auto matching. Their scripts are designed for human 
interaction on 
every movie selection, and this just goes against the design philosophy of 
Moving 
Pictures.

Original comment by conrad.john on 4 Sep 2008 at 4:31

GoogleCodeExporter commented 9 years ago

Ok, I'll look at it again when I get home.
Have you used xbmc ? I have loaded it on my computer, and it required 
ABSOLUTELY no 
intervention to recognize my movies/shows.
So somehow those scrappers do work. We just need to figure how.

Original comment by pira...@gmail.com on 4 Sep 2008 at 4:34

GoogleCodeExporter commented 9 years ago

For full disclosure, here are the other factors that influenced my decision:

1) XBMC is written in C++ while MediaPortal (and in turn Moving Pictures) is 
written in C#. This means I 
would have to rewrite their parser (which unfortunately has virtually no 
documentation in the code).

2) Their regex parser is not fully featured. It does not support lazy matching 
or interestingly the "\w" 
element. I am not sure I could duplicate this exactly without writing my own 
regex parser, so this means 
that you would get inconsistent script behavior between XBMC and Moving 
Pictures. These limitations also 
make it more difficult to construct regex statements to scrape websites. It of 
course does not make it 
impossible, it just complicates things.

3) The method of outputting results with the XBMC scripting engine is fairly 
cryptic. The script writer 
has to basically construct an XML document for the output. And what makes this 
even worse is this 
construction is embedded in an existing XML document, which means all special 
characters must be escaped. 
This dramatically complicates things, reducing the maintainability of existing 
scripts and making new 
scripts much more difficult to write.

4) And like I mentioned above, the output of the search scripts is not 
consistent, making it very 
difficult to auto-approve effectively for all situations.

Original comment by conrad.john on 4 Sep 2008 at 4:48

GoogleCodeExporter commented 9 years ago

Original comment by conrad.john on 17 Sep 2008 at 2:49

Changed state: FixedCompleted