Ge0rg3 / requests-ip-rotator

A Python library to utilize AWS API Gateway's large IP pool as a proxy to generate pseudo-infinite IPs for web scraping and brute forcing.
https://pypi.org/project/requests-ip-rotator/
GNU General Public License v3.0
1.35k stars 140 forks source link

REST vs HTTP Api #16

Open simplexx opened 2 years ago

simplexx commented 2 years ago

Hey there, thanks a lot for this great lib! I noticed that it uses the REST api. Would it theoretically be possibe to use the HTTP Api, which is only 1/3 the price?

Ge0rg3 commented 2 years ago

Hey, I've actually never heard of this - sounds very interesting! Will look into it more over the next month, and hopefully get some implementation done if it seems possible. 👍

A quick Google search shows that it seems to come down the functionality -- HTTP APIs currently don't offer all of the same functionality as REST, so we will need to see whether all of the API Gateway-related calls are available.

simplexx commented 2 years ago

I've been playing around with this a bit yesterday, created an API in my account and made the API with the target site as integration: https://target-site.com/{proxy}. Then I also created a route: /{proxy+}

After this, my api URL redirected to target site, with the specified path, for example:

https://xxxxxx.execute-api.eu-west-1.amazonaws.com/testpath

redirected to

https://target-site.com/testpath

I also cloned this project and changed some code so instead of the json api, a http api is used. not that many changes required for this. However, was not able to make it fully working yet, and not sure I will, as I d not have any much experience with python and none with using amazon services programmatically.

The new http api uses the apigatewayv2.

I think most of the code can be reused, but the methods are different (names and return values).

I did not check yet if x forwarded for header can be faked.

simplexx commented 2 years ago

I will play around some more and see if I can find out more things. would certainly be awesome if this new api could be used, saving 2/3 of the price is a big deal imo!

Ge0rg3 commented 2 years ago

Sounds very promising, thanks for looking into these 😄 Please could you move your changes over to a fork so I could take a look at the changes you made?

simplexx commented 2 years ago

I will, please give me a few days to play with it some more, and then I'll push what I have.

simplexx commented 2 years ago

I opened a pull request with a version that is using the http api. needs more testing.

simplexx commented 2 years ago

Currently running the script in production to save costs, and it works fine. However, some things that need testing: -Just different sites -Multiple regions (I use just one per instance) -forwarded header -?

simplexx commented 2 years ago

One of my instances just got blacklisted, so I checked the headers and it does not work as it should: "HTTP_X_MY_X_FORWARDED_FOR":"85.64.xxx.xxx" "HTTP_FORWARDED":"by=35.179.xxx.xxx;for=46.127.xxx.xxx;host=xxxxxtd42.execute-api.eu-west-2.amazonaws.com;proto=https", "HTTP_VIA":"HTTP\/1.1 AmazonAPIGateway"

The true IP is leaked unfortunately.

simplexx commented 2 years ago

Looks like we can not spoof the HTTP_FORWARDED header, because it is restricted, which makes the whole http api unusable.

herissondev commented 2 years ago

Hi @simplexx did you find a work around for this ? I would love to reduce my cost of 2/3 ^^

simplexx commented 2 years ago

Hi @aime-risson It's not possible unfortunately :(

Ge0rg3 commented 2 years ago

Ahh too bad, thanks for looking onto this.

herissondev commented 2 years ago

What I have done is using lambdas as proxys. This ended up being much cheaper and much more efficient as lambdas change ips on every run too.

Ge0rg3 commented 2 years ago

Hi @aime-risson, did you not find that the lambda start times decreased the speed too much? Thanks!

codemonies commented 1 year ago

What I have done is using lambdas as proxys. This ended up being much cheaper and much more efficient as lambdas change ips on every run too.

How are you changing IPs on every request? It seems to be a static IP address for me, that only changes when I re-deploy the code.

HyperRays commented 9 months ago

It seems that if you chain two http Apis together the ip is not leaked

Ge0rg3 commented 9 months ago

@HyperRays very nice find... We could definitely make this optional as it'll be 2x cost but could be really useful. Any shot you're up to PR?

HyperRays commented 9 months ago

Not yet, but I'd be more than happy to give it a go

Ge0rg3 commented 8 months ago

Thanks @HyperRays, let me know how you get on! 🤞