Cduniverse scraper broken

GoogleCodeExporter commented 8 years ago

Scraper reports "Unable to connect to remote server"
It appears there has been a backend site change including (but not limited to) 
moving a large portion of the content to cduniverse.ws (was cduniverse.com)

Original issue reported on code.google.com by ltub...@gmail.com on 10 Aug 2012 at 3:41

GoogleCodeExporter commented 8 years ago

This issue was closed by revision r129.

Original comment by mrdougqu...@gmail.com on 28 Aug 2012 at 1:58

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

Still seems to be broken for me... even with the latest v1.03 / r130 commit.
The issue presents itself as no results found. (

Have cleared cookies and tried on several different machines (PC & AppleTV 2) 
running both Eden (v11 Official) and Darwin/Nightlies (v12 September 2012 
Commit) to no avail.

Original comment by ltub...@gmail.com on 1 Sep 2012 at 1:19

GoogleCodeExporter commented 8 years ago

Can you access the website via a web browser after it tries to update the 
library? Looks like there is some rate limiting.
Also looks like the scraper needs to be updated because even with a folder with 
just a couple of movies it doesn't seem to be able to find a match. I don't use 
scrapers so not sure what is meant to work and what isn't. Could you have a 
look?

Original comment by mrdougqu...@gmail.com on 2 Sep 2012 at 1:18

GoogleCodeExporter commented 8 years ago

Yep I can access cduniverse via browser no probs.

My regex skills are pretty poor and my ability to see through URL encoded 
special ascii chars only exists at a specific caffeine vs sleep deprevation 
window BUT...

Part 1 of the problem is that the GetSearchResults search string needs to be 
updated as follows:-

Line 9: <RegExp input="$$1" 
output="<entity><title>\2</title><url>http://cduniverse.com/productinfo.asp?pid=
\1&style=ice</url></entity>" dest="5+"> ***i.e. "style=ice" needed to be moved 
after PID in the URL***
Line 10: <expression repeat="yes"><a  
href="/productinfo.asp\?pid=(\d+)[^>]*><font size\="2"><b>([^<&]+)</expression> 
***i.e. website changed the tag <font size=2> to <font size="2">***

It looks like all the input and output strings in GetDetails need to be 
corrected as well but at the moment my brain can't handle that much Regex.

Original comment by ltub...@gmail.com on 2 Sep 2012 at 2:59

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

Original comment by mrdougqu...@gmail.com on 2 Sep 2012 at 3:52

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Original comment by mrdougqu...@gmail.com on 2 Sep 2012 at 4:30

GoogleCodeExporter commented 8 years ago

Think I have a working scraper now (tested with ScraperXML Editor), see 
attached.

However, still can't connect to the server from XBMC, because curl doesn't 
follow the redirect (curl -L works from CLI on a dev machine), and I haven't 
yet figured out how to tell XBMC to add the -L option...

Original comment by kmi...@gmail.com on 21 Jan 2013 at 9:49

Attachments:

cduniverse.xml

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

[deleted comment]

GoogleCodeExporter commented 8 years ago

From the log, seems to be failing in XBMC core:

19:31:41 T:2736338016   DEBUG: scraper: CreateSearchUrl returned 
<url>http://www.cduniverse.com/warning.asp?Decision=I+Agree+%2D+ENTER&CrossOver=
&Referer=%2Fsresult%2Easp%3FHT%5FSearch%3DTITLE%26HT%5FSearch%5FInfo%3Dpenny%20f
lame%26style%3Dice</url>
19:31:41 T:2736338016   DEBUG: CurlFile::Open(0x42ec418) 
http://www.cduniverse.com/warning.asp?Decision=I+Agree+%2D+ENTER&CrossOver=&Refe
rer=%2Fsresult%2Easp%3FHT%5FSearch%3DTITLE%26HT%5FSearch%5FInfo%3Dpenny%20flame%
26style%3Dice
19:31:41 T:2736338016    INFO: easy_aquire - Created session to 
http://www.cduniverse.com
19:31:41 T:2736338016 WARNING: FillBuffer: curl failed with code 22
19:31:41 T:2736338016   ERROR: CCurlFile::CReadState::Open, didn't get any data 
from stream.
19:31:41 T:2736338016   ERROR: Run: Unable to parse web site
19:31:42 T:2736338016   ERROR: Process: Error looking up item Penny Flame.avi
19:31:42 T:2736338016   DEBUG: Thread CVideoInfoDownloader 2736338016 
terminating

From looking at the CurlFile source, it looks like the redirect option should 
be set, but my debugging capability stops here:

https://github.com/xbmc/xbmc/blob/master/xbmc/filesystem/CurlFile.cpp#L390

Original comment by kmi...@gmail.com on 22 Jan 2013 at 8:02

GoogleCodeExporter commented 8 years ago

Original comment by mrdougqu...@gmail.com on 22 Jan 2013 at 10:35

GoogleCodeExporter commented 8 years ago

This issue was closed by revision r192.

Original comment by mrdougqu...@gmail.com on 22 Jan 2013 at 10:37

Changed state: Fixed

DrJLR / xbmc-adult

Cduniverse scraper broken #38