claffin / cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
https://cloudproxy.io/
MIT License
1.4k stars 79 forks source link

Maybe add feature for multiple digital ocean accounts? #21

Closed ghost closed 3 years ago

ghost commented 3 years ago

I'm loving this, it's perfect for what I need. Maybe a feature to consider in the future is to be able to use multiple digital ocean accounts as there's a limit of 10 for new users?

Thanks for releasing this!

claffin commented 3 years ago

Thanks for the feedback and glad you find it useful.

DO raise the limit to 100 without much hassle I've found and not necessarily keen the introduce this as it probably circumvents DO's TOS.

An alternative would be to run multiple instances of CloudProxy with Docker. Not terribly resource-efficient and you'd have to implement something within your application to hit the multiple APIs.

I will leave the issue open for now in case there is a wider interest.

ghost commented 3 years ago

Thank you, I didn't know it was that simple on a new account.

I run a homelab so self host so haven't used DO before.

May I ask what you said you used it for to get your account upgraded?

Do you also have an etherium address? I'll buy you a beer.

claffin commented 3 years ago

I can't exactly remember, I've had it for a while, I think DO just require a payment or two to be made on the account before they upgrade the limit. Topping up your account with credit or you can settle your bill early as well, then requesting for the limit upgrade. Though no guarantees.

My etherium address is 0x1C9cd536F189711f335Be1AAaf4E94a6F46DA17D. Many thanks!

ghost commented 3 years ago

I'm sorry but the aws now doesnt seem to work, the VMs spin up but don't appear on the proxy list or in the UI even after 15 mins, with the console saying waiting for allication.

claffin commented 3 years ago

Thanks for reporting. I've found the issue and released a fix now. Please use the image cloudproxy:0.3.3-beta or cloudproxy:latest now.

ghost commented 3 years ago

Hi,

I now seem to get this error:

` Traceback (most recent call last):

File "/usr/local/lib/python3.8/threading.py", line 890, in _bootstrap self._bootstrap_inner() │ └ <function Thread._bootstrap_inner at 0x7fd7f2af6040> └ <Thread(ThreadPoolExecutor-0_1, started daemon 140565377611520)> File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() │ └ <function Thread.run at 0x7fd7f2af5d30> └ <Thread(ThreadPoolExecutor-0_1, started daemon 140565377611520)> File "/usr/local/lib/python3.8/threading.py", line 870, in run self._target(*self._args, *self._kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <Thread(ThreadPoolExecutor-0_1, started daemon 140565377611520)> │ │ │ └ (<weakref at 0x7fd7efa8d400; to 'ThreadPoolExecutor' at 0x7fd7f12318b0>, <_queue.SimpleQueue object at 0x7fd7efb74950>, None,... │ │ └ <Thread(ThreadPoolExecutor-0_1, started daemon 140565377611520)> │ └ <function _worker at 0x7fd7f11f2d30> └ <Thread(ThreadPoolExecutor-0_1, started daemon 140565377611520)> File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 80, in _worker work_item.run() │ └ <function _WorkItem.run at 0x7fd7f11f2c10> └ <concurrent.futures.thread._WorkItem object at 0x7fd7efa93280> File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(self.args, **self.kwargs) │ │ │ │ │ └ {} │ │ │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fd7efa93280> │ │ │ └ [<Job (id=251677fd631043ddbb85e5197ee666dc name=aws_manager)>, 'default', [datetime.datetime(2021, 5, 10, 10, 34, 57, 579415,... │ │ └ <concurrent.futures.thread._WorkItem object at 0x7fd7efa93280> │ └ <function run_job at 0x7fd7f11fdf70> └ <concurrent.futures.thread._WorkItem object at 0x7fd7efa93280>

File "/usr/local/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job retval = job.func(*job.args, **job.kwargs) │ │ │ │ │ └ <member 'kwargs' of 'Job' objects> │ │ │ │ └ <Job (id=251677fd631043ddbb85e5197ee666dc name=aws_manager)> │ │ │ └ <member 'args' of 'Job' objects> │ │ └ <Job (id=251677fd631043ddbb85e5197ee666dc name=aws_manager)> │ └ <member 'func' of 'Job' objects> └ <Job (id=251677fd631043ddbb85e5197ee666dc name=aws_manager)>

File "/app/cloudproxy/providers/manager.py", line 15, in aws_manager ip_list = aws_start() └ <function aws_start at 0x7fd7efc91550>

File "/app/cloudproxy/providers/aws/main.py", line 92, in aws_start aws_check_delete() └ <function aws_check_delete at 0x7fd7efc914c0>

File "/app/cloudproxy/providers/aws/main.py", line 77, in aws_check_delete if instance["Instances"][0]["PublicIpAddress"] in delete_queue: │ └ set() └ {'Groups': [], 'Instances': [{'AmiLaunchIndex': 0, 'ImageId': 'ami-096cb92bb3580c759', 'InstanceId': 'i-0b18cc1721ffae51a', '...

KeyError: 'PublicIpAddress' `

claffin commented 3 years ago

If you just ignore that error, it should still deploy the AWS instances successfully. It's a known issue where AWS creates the instance but doesn't allocate the IP instantly so when CloudProxy reads the response from AWS it cannot find the IP and causes the key error. I will fix it in a future release but in the interim, it should still continue to work as once AWS allocates the IP, it stops raising the exception.

Let me know if it doesn't.

ghost commented 3 years ago

Hello,

I'm sorry but for amazon it just says "Pending: AWS allocating" and does not work even though the VM has been allocated in my aws dashboard and is fully up and running (just the vm and docker inside it, didn't check the proxy).

Also just as a little side note the -e DIGITALOCEAN_MIN_SCALING=0 -e DIGITALOCEAN_MAX_SCALING=0 commands do not work, it always starts at 2 but isnt important.

claffin commented 3 years ago

Are you sure you're running the latest Docker image? Try with the tag cloudproxy:0.3.3-beta if you haven't. If not, set the scaling to 0 first and check all the VMs have been destroyed, then scale up again.

Also, check your password or username set doesn't include any characters which may cause an issue such as semicolons. Your best to use alphanumeric characters as passwords or usernames to ensure it doesn't cause any issues.

If none of those work, could you try running this command (assuming you're using Linux or Mac with CURL installed): curl --proxy "http://USERNAME:PASSWORD@PROXYIP:8899" "https://api.ipify.org/"

Obviously, replace the USERNAME, PASSWORD with your username and password, and PROXYIP with an IP of one of your AWS VMs. Give your AWS VM 6 to 8 minutes to deploy before running. If the response is the IP of the AWS VM then we know the proxy is working fine so helps narrow down the issue.

ghost commented 3 years ago

Yeah thanks, it ended up being that, I could've sworn I did a pull.

Thanks!

claffin commented 3 years ago

Great, glad it's resolved!