Closed GoogleCodeExporter closed 9 years ago
Original comment by conrad.john
on 25 Jul 2008 at 4:22
For the IMDB dataprovider, we should consider creating a more general scraping
dataprovider that takes a config file (one time only, with config later stored
in the
database) as input. You could conceivably create a generic scraper that only
needed a
few regex strings to customize it to specific sites.
Original comment by conrad.john
on 4 Aug 2008 at 2:40
Good info and decent discussion about an existing system used by XBMC that is
very
similar to what I was suggesting above:
http://www.meedios.com/forum/viewtopic.php?p=30418
We should review what XMBC has done to see if we can build off their stuff and
cruise
past the mistakes they may have ran into along the way.
Original comment by conrad.john
on 5 Aug 2008 at 1:35
A generic data provider is essential for multi-lingual support.
Original comment by conrad.john
on 6 Aug 2008 at 4:14
Original comment by conrad.john
on 6 Aug 2008 at 4:23
This should apply to cover art as well. Should probably at least supply from
impawards.com.
Original comment by conrad.john
on 22 Aug 2008 at 1:52
Checkout my suggestion to use XBMC's XML scrapers for HTTP scraping
http://forum.team-mediaportal.com/improvement-suggestions-46/suggestion-use-xbmc
s-xml-scrapers-http-scraping-35312/
http://www.meedios.com/forum/viewtopic.php?p=30418
XBMC scrapers already does everything of what users are asking for here.
Original comment by gamester...@gtempaccount.com
on 30 Aug 2008 at 3:43
Will strongly consider using XBMC's system. Will give it a look in the coming
week.
Original comment by conrad.john
on 1 Sep 2008 at 9:03
Original comment by conrad.john
on 1 Sep 2008 at 9:03
One great function for XBMC scraper is to save local cache together with the
movie
in NFO file.
It's really a great concept, giving great flexibility of usage.
They even have application to scan the shares and create those NFO files.
Look at: http://xbmc.org/forum/showthread.php?t=33961
Original comment by pira...@gmail.com
on 1 Sep 2008 at 9:10
Original comment by conrad.john
on 2 Sep 2008 at 8:04
I thing unfortunately I wont be able to use XBMC's scraper system. :( The
output of
the search function is only a text based string and a url for the details page
for
the movie. The string is not even consistent, sometimes it is title, sometimes
it is
title and year, sometimes it is official title, english title, then year. It's
just
not reliable. It is not designed to work with an automated system, using their
scripting engine would effectively break the auto-approval feature of Moving
Pictures. I am looking at other options but I think I will probably do a new
system
based on XBMC's ideas.
Original comment by conrad.john
on 4 Sep 2008 at 3:46
Are you sure?
I just tried with that sample program scrap.exe and attached are results of
search
for "finding nemo".
Seems pretty streightforward: results.xml gives you matches for search, and
details.xml gives you details of first match.
Original comment by pira...@gmail.com
on 4 Sep 2008 at 4:23
Attachments:
Yep, see the "title" field in the results.xml file? In this situation the
person who
implemented the skin chose to put the title and year. To begin with, this is an
XML
document describing data. It should not be handling presentation (like how to
format the
title/year for the user) at this level. But setting that aside, the data that
is in this
field is inconsistent from script to script. It all just depends on what the
person implementing the script chose to put there. So this eliminates the
possibility to even
consistently parse useful information from there. Take a look at a few other
scripts,
foreign language scripts in particular, and you will see.
Trust me I am disapointed too, the thought of instantly having dozens of data
providers
available would be great, but unfortunately their setup just does not work with
the way
our importer does auto matching. Their scripts are designed for human
interaction on
every movie selection, and this just goes against the design philosophy of
Moving
Pictures.
Original comment by conrad.john
on 4 Sep 2008 at 4:31
Ok, I'll look at it again when I get home.
Have you used xbmc ? I have loaded it on my computer, and it required
ABSOLUTELY no
intervention to recognize my movies/shows.
So somehow those scrappers do work. We just need to figure how.
Original comment by pira...@gmail.com
on 4 Sep 2008 at 4:34
For full disclosure, here are the other factors that influenced my decision:
1) XBMC is written in C++ while MediaPortal (and in turn Moving Pictures) is
written in C#. This means I
would have to rewrite their parser (which unfortunately has virtually no
documentation in the code).
2) Their regex parser is not fully featured. It does not support lazy matching
or interestingly the "\w"
element. I am not sure I could duplicate this exactly without writing my own
regex parser, so this means
that you would get inconsistent script behavior between XBMC and Moving
Pictures. These limitations also
make it more difficult to construct regex statements to scrape websites. It of
course does not make it
impossible, it just complicates things.
3) The method of outputting results with the XBMC scripting engine is fairly
cryptic. The script writer
has to basically construct an XML document for the output. And what makes this
even worse is this
construction is embedded in an existing XML document, which means all special
characters must be escaped.
This dramatically complicates things, reducing the maintainability of existing
scripts and making new
scripts much more difficult to write.
4) And like I mentioned above, the output of the search scripts is not
consistent, making it very
difficult to auto-approve effectively for all situations.
Original comment by conrad.john
on 4 Sep 2008 at 4:48
Original comment by conrad.john
on 17 Sep 2008 at 2:49
Original issue reported on code.google.com by
conrad.john
on 26 Apr 2008 at 6:59