Sproglet / oversight

Oversight Jukebox for Popcorn Hour 1 and 2 series. Original code by Lordy.
1 stars 0 forks source link

IMDB (Desktop version) scrape has a lot missing since new site #512

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Since a short while, the IMDB information is missing from the Desktop version. 
Mobile version is still fine! 
The title of the movie is still there, but it's also not correct: Title found 
[Downstream (2010) - IMDb]. The rest of the movie information like rating, 
plot, etc. is all missing.

Here is a example of the scan log for the movie Downstream:

[INFO]     17:59:36 :   >Begin scrape imdb [http://www.imdb.com/title/tt1229236/]
[INFO]     17:59:36 :       scraping http://www.imdb.com/title/tt1229236/
[INFO]     17:59:36 :       Using persistent cache
[DEBUG]    17:59:36 :       START IMDB: title: poster  genre  cert  year 
[DEBUG]    17:59:36 :       Title found [Downstream (2010) - IMDb] current title []
[DEBUG]    17:59:36 :       IMDB: Got year [2010]
[DEBUG]    17:59:36 :       imdb title=[Downstream (2010) - IMDb]
[DEBUG]    17:59:36 :       Imdb title = [Downstream (2010) - IMDb]
[DEBUG]    17:59:36 :       :[]  promoted to imdb:[Downstream (2010) - Imdb] 
[INFO]     17:59:36 :       UTF-8 Encoding:1
[INFO]     17:59:37 :       actors|nm1555200||
[INFO]     17:59:37 : 
        actors|nm0731068||http://ia.media-imdb.com/images/M/MV5BMjE3MTk3ODcyNl5BMl5Ban
BnXkFtZTcwNzYxODU1MQ@@._V1_.jpg
[INFO]     17:59:37 :       actors|nm1446823||
[INFO]     17:59:37 : 
        actors|nm0902188||http://ia.media-imdb.com/images/M/MV5BMTk5Njg1MzU0MF5BMl5Ban
BnXkFtZTcwMzMyODE4Mg@@._V1_.jpg
[INFO]     17:59:37 : 
        actors|nm0428856||http://ia.media-imdb.com/images/M/MV5BMTIxNzE2NjM0NV5BMl5Ban
BnXkFtZTcwNDc2NTcyMQ@@._V1_.jpg
[INFO]     17:59:37 :       actors|nm3060649||
[INFO]     17:59:37 : 
        actors|nm1696211||http://ia.media-imdb.com/images/M/MV5BMTc3MjIxNTk1OF5BMl5Ban
BnXkFtZTcwNzg2MTkxMw@@._V1_.jpg
[INFO]     17:59:37 : 
        actors|nm1463709||http://ia.media-imdb.com/images/M/MV5BMTgzMjE3ODUxNV5BMl5Ban
BnXkFtZTcwNzA5Nzk3Mg@@._V1_.jpg
[INFO]     17:59:37 :       compress 
[nm1555200,nm0731068,nm1446823,nm0902188,nm0428856,nm3060649,nm1696211,nm1463709
] = [Þö€,¬Ï¼,ا§,·ˆ¬,š–¸,ºç©,çÃÓ,Ù«]
[DEBUG]    17:59:37 :       Genre=[Action | Sci-Fi]
[DEBUG]    17:59:37 :       imdb:[Downstream (2010) - Imdb]  promoted to 
imdb_orig:[Downstream (2010) - Imdb] 
[DEBUG]    17:59:37 :       AKA Downstream (2010) - Imdb vs Downstream (2010) - Imdb
[DEBUG]    17:59:37 :       AKA: <span class="see-more inline"><a 
href="releaseinfo#akas">See more</a></span> &raquo; </div>
[DEBUG]    17:59:37 :        AKA array:1=[ <span class="see-more inline"><a 
href="releaseinfo#akas">See more</a></span> &raquo; </div>]
[DEBUG]    17:59:37 :       Checking aka [ See more  ]
[ERR]      17:59:38 :       Unparsed imdb sections 
[DEBUG]    17:59:38 :        missing:_d=[Director]
[DEBUG]    17:59:38 :        missing:_W=[Writers]
[DEBUG]    17:59:38 :        missing:_r=[Rating]
[INFO]     17:59:38 :       Already have 
ovs:_J/ovs_Downstream_2010_-_Imdb_2010_tt1229236.jpg 
[/share/Apps/oversight/db/global/_J/ovs_Downstream_2010_-_Imdb_2010_tt1229236.jp
g]
[INFO]     17:59:38 :       Already have 
ovs:_fa/ovs_Downstream_2010_-_Imdb_2010_tt1229236.jpg 
[/share/Apps/oversight/db/global/_fa/ovs_Downstream_2010_-_Imdb_2010_tt1229236.j
pg]
[INFO]     17:59:38 :       >Begin getMovieConnections
[INFO]     17:59:38 :           >Begin 
scan_page_for_matches[http://www.imdb.com/title/tt1229236/movieconnections]
[INFO]     17:59:38 : 
                [][(<h[1-5]>[^<]+</h[1-5]>|tt[0-9][0-9][0-9][0-9][0-9]+)][0]
[INFO]     17:59:38 :               Using persistent cache
[INFO]     17:59:38 :               UTF-8 Encoding:-1
[INFO]     17:59:39 :           <End 
scan_page_for_matches[http://www.imdb.com/title/tt1229236/movieconnections]=[=[8
7]]
[INFO]     17:59:39 :           compress 
[tt1229236,tt1229236,tt1229236,tt1229236,tt1229236,tt1229236,tt1229236,tt1229236
,tt1229236,tt1229236,tt1229236,tt1229236,tt1229236,tt1229236] = 
[˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,Ë
ƒ´,˃´,˃´]
[DEBUG]    17:59:39 :            tt1229236 movie connections:Related 
Links=[˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,˃´,Ë�
�´,˃´,˃´,˃´]
[INFO]     17:59:39 :       <End getMovieConnections=[]
[INFO]     17:59:39 :       follows=
[INFO]     17:59:39 :       followed_by=
[INFO]     17:59:39 :       remakes=
[INFO]     17:59:39 :   <End scrape imdb 
[http://www.imdb.com/title/tt1229236/]=[=[M]]

Original issue reported on code.google.com by domini...@dofl.nl on 18 Oct 2010 at 6:59

GoogleCodeExporter commented 9 years ago
Thanks , IMDB Desktop scraping is a known issue, and is being addressed as part 
of bigger changes for multi-lingual support and also less dependence on IMDB.

Recommend using the Mobile option until next release.

I'll keep this ticket open though as I dont have one that explicitly mentions 
desktop scraping.

Original comment by lordylo...@gmail.com on 18 Oct 2010 at 12:37

GoogleCodeExporter commented 9 years ago
Hi Lordy,

Thnx for the response. I've changed to the mobile version which works 
flawlessly. The only thing that doesn't work is the rating, is this correct?

Original comment by domini...@dofl.nl on 18 Oct 2010 at 1:02

GoogleCodeExporter commented 9 years ago
yup - currently working on scanning changes including imdb fixes

Original comment by lordylo...@gmail.com on 31 Oct 2010 at 4:01

GoogleCodeExporter commented 9 years ago
Use different interface by default

Original comment by a...@lordy.org.uk on 1 Jun 2011 at 2:21