FastEddyCurrent / zap2xml

zap2xml in Python 2.7 for use on the RaspberryPI
10 stars 6 forks source link

Can no longer pull data from zap2it #13

Open NumberB opened 7 years ago

NumberB commented 7 years ago

According to the timestamp I haven't pulled new EPG data for 1 week (July 17 was the last successful). My cron keeps running and it looks to be an issue with zap2it rather than the script. It looks like they may no longer return sorted data like the script scrapped.

Here's the 3rd entry in my log:

2017-07-24 10:21:13,230 Form:
<zbSearchForm get http://tvschedule.zap2it.com/tvlistings/ZCSearch.do application/x-www-form-urlencoded
  <TextControl(searchTerm=)>
  <HiddenControl(searchType=simple) (readonly)>
  <HiddenControl(aid=tvschedule) (readonly)>
  <SubmitControl(=Search) (readonly)>>:Function: loginZAP :Line: 552
2017-07-24 10:21:13,235 Form:
<zbSearchFormAdv get http://tvschedule.zap2it.com/tvlistings/ZCSearch.do application/x-www-form-urlencoded
  <TextControl(searchTerm=)>
  <SelectControl(searchField=[*name, episodeName, episodeName, description, crew])>
  <SelectControl(searchGenre=[*, movie, sports, children, special, news])>
  <CheckboxControl(searchFavorites=[true])>
  <CheckboxControl(searchHD=[true])>
  <HiddenControl(searchType=advanced) (readonly)>
  <HiddenControl(aid=tvschedule) (readonly)>
  <SubmitControl(=Search) (readonly)>>:Function: loginZAP :Line: 552
2017-07-24 10:21:13,239 Form:
<zcLoginForm POST http://tvschedule.zap2it.com/tvlistings/ZCLogin.do?category= application/x-www-form-urlencoded
  <TextControl(username=email@address.com) (readonly)>
  <PasswordControl(password=)>
  <SubmitControl(<None>=Login) (readonly)>
  <IgnoreControl(loginReset=<None>)>
  <HiddenControl(zc-login-forwardURL=) (readonly)>>:Function: loginZAP :Line: 552
2017-07-24 10:21:23,735 Didn't Match .*Logout of your Screener account.* Sleep  1 sec.:Function: loginZAP :Line: 576
2017-07-24 10:21:24,741 Failed to login within ,3, retries.
:Function: loginZAP :Line: 579
2017-07-24 10:21:24,742 None

I logged out and logged into zap2it.com using my browser to verify that my user/password are correct. They are.

I also tried (after logging in) to pull up this URL: "http://tvschedule.zap2it.com/tvlistings/ZCLogin.do?method=getStandAlonePage&aid=tvschedule", but it brings up a blank page and puts up the login form on the left.

So is this a general zap2it issue or something in the script? To me it looks like zap2it is trying to drop free scrapping :(

NumberB commented 7 years ago

As an update I ran the standard old "zap2xml.pl" perl script on a VM of mine and it was able to connect and parse the zap2it data just fine, so it isn't zap2it, it's this script. User/password were the same.

This was the end of the perl script output:

[28/28] Parsing: /home/tvheadend/zap2xml/cache/1501473600000.html.gz
Downloaded 3259063 bytes in 75 http requests.
Writing XML file: /home/tvheadend/zap2xml/xmltv.xml
Completed in 45s (Parse: 43s) 69 stations, 5776 programs, 17309 scheduled.
NumberB commented 7 years ago

I just realized I never did come back and share the fix for this. It's simple:

Edit this line: loggedinStr = '.*Logout of your Screener account.*' To be this: loggedinStr = '.*Logout of your Zap2it account*'

And you're good to go.

Line 569 mentions this: "# todo find response success like perl script rather than search whole page". If Zap2it changes text like this in the future this issue will pop up again and need the manual fix. It would be better to implement that response success as it says :)