EchterAlsFake / PHUB

A lightweight API for Pornhub
https://phub.rtfd.io
GNU General Public License v3.0
76 stars 25 forks source link

Error when getting multiple video titles #5

Closed brightpepe closed 1 year ago

brightpepe commented 1 year ago

When attempting to grab video titles by iterating through a list of URLs, the following error is shown once it has successfully grabbed 20 titles:

_Traceback (most recent call last): File "/home/brightpepe/Documents/Code/test_ph.py", line 6, in video = client.get('https://www.pornhub.com' + al) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/brightpepe/.local/lib/python3.11/site-packages/phub/core.py", line 288, in get return classes.Video(client = self, url = url, preload = preload) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/brightpepe/.local/lib/python3.11/site-packages/phub/classes.py", line 179, in init self.refresh() File "/home/brightpepe/.local/lib/python3.11/site-packages/phub/classes.py", line 193, in refresh self.data = parser.resolve(self.page) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/brightpepe/.local/lib/python3.11/site-packages/phub/parser.py", line 22, in resolve flash, ctx = consts.regexes.video_flashvar(html)[0]


IndexError: list index out of range_

Already updated to the latest version of the module. 

Example code attached
[example.txt](https://github.com/Egsagon/PHUB/files/12328176/example.txt)
EchterAlsFake commented 1 year ago

Hey, I also had some issues with that.

I think the problem here is that PornHub just blocks the API requests after some time. I made a standalone application for an example to download multiple videos at a time, and whenever I used threaded Downloads or the Search function, then I got several Index out of Range Errors or Connection Errors, but they were always fixed when trying to reinitialize the client, changing the IP address or waiting a few minutes. If the index is out of range, it means basically that there is no data in it. So PornHub didn't return anything.

I think that the error is on PornHub's side and not from Egsagons API.

I would recommend you to make an Exception for it and if the error occurs, you are just waiting a few seconds (like 30-60) and try to reinitialize the Client, maybe also with another language.

Have a great day :)

Egsagon commented 1 year ago

Hello

Indeed, as @EchterAlsFake says, PH seems to return something else that a video page at some point. It looks somewhat like this:

<html>
 <head>
  <script type="text/javascript">
function go() {

  var p = 1794922325391;
  var s = 44551209;
  var n;

  if ((s >> 5) & 1) /**13;*/
    p+= 294461683*6; /**13;*/
  else p-= 69431551*6;

if ((s >> 14) & 1)  p+= 81738565*17;
else /*
p+= */p-=/* 120886108*
*/17464060*/*
*13;
*/15;/* 120886108*
*/if ((s >> 15) & 1) p+=/*
p+= */23956128*/*
*13;
*/16;/* 120886108*
*/else /*
else p-=
*/p-=   29105916*16;if ((s >> 6) & 1)
p+=/*
p+= */94149136*/*
else p-=
*/9;    else p-=
109242417*/*
else p-=
*/7; if ((s >> 0) & 1)
p+=/*
else p-=
*/703233716* 3;/*
p+= */else /* 120886108*
*/p-=/*
p+= */1318523439* 1;/*
else p-=
*/ p-=5883363342;
 n=leastFactor(p);
{ document.cookie="RNKEY="+n+"*"+p/n+":"+s+":1328478927:1;path=/;";
  document.location.reload(true); }
}
//-->
  </script>
 </head>
 <body onload="go()">
  Loading...
 </body>
</html>

It looks like a loading screen where the client is supposed to calculate some kind of cookie, maybe to renew the session. I'll look into that,but for now, adding a small delay (.5 sec) is enough to remove the error.

Egsagon commented 1 year ago

I might as well implement a built-in delayer in Client objects not to overload the servers more than a normal client should.

Egsagon commented 1 year ago

This should be fixed with 2944d11f1863abf1ec61d212623c0870eba3d41a.

The parser is now able to calculate cookies and renew connections with them. I won't upload the package yet tho, so if you want to get that feature, update phub from this repository (pip install --upgrade git+https://github.com/Egsagon/PHUB.git). @brightpepe's script should work now.

Note: I have a shitty connection so i don't really know if what i did just works or just brings delay while calculating the cookie, so if you could tell me that would be cool.

Additionally, you can use a small delayer with the client if you don't want to be absolutely sure you are not overloading servers, like so:

import phub
client = phub.Client(delay = True)

By default, the delay if of half a second but you can change it:

client.delay = 10

Thanks for reporting that issue!

brightpepe commented 1 year ago

Thanks for looking into this!

Updated to the latest version and it's now working as expected. There doesn't appear to be any noticeable delay when processing all the links in the example.