Ge0rg3 / requests-ip-rotator

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
https://pypi.org/project/requests-ip-rotator/
GNU General Public License v3.0
1.36k stars 140 forks source link

Failed to delete API #65

Closed 1nathanliang closed 10 months ago

1nathanliang commented 10 months ago

Hi @Ge0rg3—first, thank you so much for making this great package! I'm reaching out on my concern that I haven't properly deleted all open endpoints from several test runs that have errored out. While I do see several Deleted X endpoints with for site... messages, I also see several Failed to delete API <api identifier> messages, and it's got me worried that I haven't closed them properly even when only using the auto-close method (i.e., with ...) you've specified in the documentation. Would running gateway.shutdown() in a new Jupyter code cell resolve the issue? When I do this, it says that no gateways have been deleted. While I used awscli configure to begin with, I also don't know how to verify whether those endpoints have been deleted. Any help would be much appreciated!

Ge0rg3 commented 10 months ago

Hey @1nathanliang, apologies for the typing here - writing from my phone!

Firstly, I believe that if the APIs exist but aren't being used, they shouldn't accrue any charges so don't worry about billing.

Secondly, yes, if you run a .start() and .shutdown() again it will delete the existing ones. It's crucial that you use the same credentials again, if you're unsure you can always directly pass your access key id and secret directly into the ApiGateway() object on init instead of relying on environment credentials.

If you see 'failed to delete' it seems like there may be some permission issue, are you sure your account credentials have full ApiGateway access?

1nathanliang commented 10 months ago

No need to apologize at all! Thank you so much for the lightning quick response!!

Got it--I did also notice in the docs that you said that it'll use the existing ones unless you force it to use new APIs, so I'm fairly certain they're deleted. FWIW, I do have [AmazonAPIGatewayInvokeFullAccess] turned on for my IAM user, so not sure what's going on there lol. For context (and apologies for digressing if this ought to be a separate post), I'm trying to replicate much of what was mentioned here (https://github.com/Ge0rg3/requests-ip-rotator/issues/4#issue-1003223984) except I'm using ray for parallelization, and I wonder whether that has something to do with it?

The above aside, I'm also just getting sent to Google's 404 despite valid URLs when I run the following--do you know why this might be happening?:

from requests_ip_rotator import ApiGateway
import requests
from bs4 import BeautifulSoup

gateway = ApiGateway("https://www.google.com")
gateway.start()
session = requests.Session()
session.mount("https://www.google.com", gateway)

soup = BeautifulSoup(session.get("https://www.google.com/search?q=test").text, 'html.parser')
print(soup)
Starting API gateways in 10 regions.
Using 10 endpoints with name 'https://www.google.com/ - IP Rotate API' (10 new).
https://www.google.com/search?q=test
<!DOCTYPE html>

<html lang="en">
<meta charset="utf-8"/>
<meta content="initial-scale=1, minimum-scale=1, width=device-width" name="viewport"/>
<title>Error 404 (Not Found)!!1</title>
...
Ge0rg3 commented 10 months ago

Hey, will check Google later but AmazonAPIGatewayInvokeFullAccess is only for calling the APIs, and doesn't cover deletion to - there's a separate policy needed for this

Ge0rg3 commented 10 months ago

Hey @1nathanliang, just ran this and checked the response URL: https://consent.google.com/ml?continue=https://www.google.com/search%3Fq%3Dtest&gl=FR&m=0&pc=srp&uxe=none&cm=2&hl=fr&src=1

Looks like the site is redirecting back to Google for cookie consent, which is why you're seeing a 404. If you run the code again with allow_redirects=False then its more clear.

This is outside of the scope of the project issue and probs just requires you to set some headers when sending the request. If you want some advice on this feel free to email me privately, my email is on https://georgeom.net 😊