Ge0rg3 / requests-ip-rotator

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
https://pypi.org/project/requests-ip-rotator/
GNU General Public License v3.0
1.36k stars 140 forks source link

403: forbidden #36

Closed jherrerogb98 closed 2 years ago

jherrerogb98 commented 2 years ago

I am experiencing a strange situation:

I know it has to be something related to headers because if i dont use any header I always get 403, but even using headers in the second link i cant get a 200 response. I dont know if something like this has happened to any of you. If someone knows the issue i would appreciate a lot if they let me know. Thank you!

Ge0rg3 commented 2 years ago

Hi, have you tried including all headers? Such as:

 s.get("https://www.zillow.com/homedetails/14851-SW-150th-St-Miami-FL-33196/44327576_zpid/", headers={
    ...:     "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    ...:     "accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
    ...:     "cache-control": "no-cache",
    ...:     "pragma": "no-cache",
    ...:     "sec-fetch-dest": "document",
    ...:     "sec-fetch-mode": "navigate",
    ...:     "sec-fetch-site": "cross-site",
    ...:     "sec-fetch-user": "?1",
    ...:     "sec-gpc": "1",
    ...:     "upgrade-insecure-requests": "1"
    ...:   }, allow_redirects=False)

I notice with this, it redirects me to a captcha so it looks like you'll have to deal with that too 😄 Hope this helps

jherrerogb98 commented 2 years ago

Thanks for answering so fast! Withouth the redirect my response is blank, athough when i paste the same link in Chrome there it seems to work. I copied my chrome headers and still couldnt make it work unluckily.

Ge0rg3 commented 2 years ago

Interesting -- sounds like a cookie's being set?

jherrerogb98 commented 2 years ago

Maybe yes, when I do a request in postman, if I leave the cookie I am automatically redirected to a captcha. I think the same happens with AWS requests

Ge0rg3 commented 2 years ago

Cool, so sounds like the site itself is blocking -- please let me know if you need any more guidance on this.