claffin / cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
https://cloudproxy.io/
MIT License
1.4k stars 79 forks source link

401 crashes after running #41

Closed chalitbkb closed 3 years ago

chalitbkb commented 3 years ago

test call

Item not found

How can I solve this problem?

image

@claffin

claffin commented 3 years ago

Just to clarify, the issue you are now facing is the GUI is not working? The proxies are now deploying fine and operating as expected.

If that is the case:

With that info, it should narrow done the issue.

chalitbkb commented 3 years ago

Just to clarify, the issue you are now facing is the GUI is not working? The proxies are now deploying fine and operating as expected.

If that is the case:

  • Please could you share the docker run command you are using.
  • Open the UI page, assuming you're using Chrome, open the Chrome console and share the output of the console.

With that info, it should narrow done the issue.

docker run -e USERNAME=Cloudproxy -e PASSWORD=Cloudproxy -e AWS_ENABLED=True -e AWS_ACCESS_KEY_ID=xxxxxxxxxxxx -e AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxx -it -p 8000:8000 laffin/cloudproxy:0.6.0-beta

image

claffin commented 3 years ago

I notice in the docker run command you're exposing cloudproxy via port 8000 however in your browser I can see you're accessing on :8080? Not sure though this would cause this problem.

Could expand one of the Uncaught (in promise) SyntaxError: etc etc with VM on the right errors and share please?

Looks like the frontend is having issue calling the backend API.

chalitbkb commented 3 years ago

I notice in the docker run command you're exposing cloudproxy via port 8000 however in your browser I can see you're accessing on :8080? Not sure though this would cause this problem.

Could expand one of the Uncaught (in promise) SyntaxError: etc etc with VM on the right errors and share please?

Looks like the frontend is having issue calling the backend API.

i solved it thanks i know why

**But I have further questions, why the proxy I use is constantly changing, it causes while I lost my internet connection, how can I solve it?

this is what i use Can I specify a static IP:PORT?

const proxyChain = require('proxy-chain');

(async () => {
    const oldProxyUrl = 'http://Cloudproxy:Cloudproxy@3.9.171.170:8899';
    const newProxyUrl = await proxyChain.anonymizeProxy(oldProxyUrl);
   .
   ..
   ...

 args: [   `--proxy-server=${newProxyUrl}`,
 .
 ...
 ...

After about 5 seconds after, the ip has been changed. or aws limits the amount of data transmitted

And I tried it with SwitchyOmega extension also encountered this problem.

claffin commented 3 years ago

I'm not sure I understand exactly.

Cloudproxy constantly tests the connectivity by running a request to https://api.ipify.org/ by proxying via each proxy server. Cloudproxy checks that https://api.ipify.org/ returns the IP of the proxy, thus the proxy is working. If it times out or returns anything but the expected proxy IP, then the proxy will be terminated and a new one spawned.

If you have internet issues or blips, this may cause Cloudproxy to think the proxy is dead, terminate it, and start a new one. It is an issue with the logic. To always get an alive proxy, ideally, you always check the Cloudproxy endpoint to see if it is still alive.

Please could you share your logs? Difficult to diagnose without.

Assuming your internet issues are causing the proxies to keep changing, I could potentially add an additional check into Cloudproxy which checks the internet connection first before running the above, which may help.

Cloudproxy generally requires a stable internet connection, so the other option may be to run it on a VM/VPS (e.g. an AWS EC2).

chalitbkb commented 3 years ago

I'm not sure I understand exactly.

Cloudproxy constantly tests the connectivity by running a request to https://api.ipify.org/ by proxying via each proxy server. Cloudproxy checks that https://api.ipify.org/ returns the IP of the proxy, thus the proxy is working. If it times out or returns anything but the expected proxy IP, then the proxy will be terminated and a new one spawned.

If you have internet issues or blips, this may cause Cloudproxy to think the proxy is dead, terminate it, and start a new one. It is an issue with the logic. To always get an alive proxy, ideally, you always check the Cloudproxy endpoint to see if it is still alive.

Please could you share your logs? Difficult to diagnose without.

Assuming your internet issues are causing the proxies to keep changing, I could potentially add an additional check into Cloudproxy which checks the internet connection first before running the above, which may help.

Cloudproxy generally requires a stable internet connection, so the other option may be to run it on a VM/VPS (e.g. an AWS EC2).


2021-07-06 07:12:45.149 | ERROR    | apscheduler.executors.base:run_job:131 - Job "aws_manager (trigger: interval[0:00:20], next run at: 2021-07-06 07:13:00 UTC)" raised an exception
Traceback (most recent call last):

  File "/usr/local/lib/python3.8/threading.py", line 890, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7f0a42abf670>
    └ <Thread(ThreadPoolExecutor-0_0, started daemon 139681959204608)>
  File "/usr/local/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
    │    └ <function Thread.run at 0x7f0a42abf3a0>
    └ <Thread(ThreadPoolExecutor-0_0, started daemon 139681959204608)>
  File "/usr/local/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
    │    │        │    │        │    └ {}
    │    │        │    │        └ <Thread(ThreadPoolExecutor-0_0, started daemon 139681959204608)>
    │    │        │    └ (<weakref at 0x7f0a3f256b30; to 'ThreadPoolExecutor' at 0x7f0a4122b3d0>, <_queue.SimpleQueue object at 0x7f0a3f28d270>, None,...
    │    │        └ <Thread(ThreadPoolExecutor-0_0, started daemon 139681959204608)>
    │    └ <function _worker at 0x7f0a4122c280>
    └ <Thread(ThreadPoolExecutor-0_0, started daemon 139681959204608)>
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 80, in _worker
    work_item.run()
    │         └ <function _WorkItem.run at 0x7f0a4122c160>
    └ <concurrent.futures.thread._WorkItem object at 0x7f0a3cf6df70>
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
             │    │   │    │       │    └ {}
             │    │   │    │       └ <concurrent.futures.thread._WorkItem object at 0x7f0a3cf6df70>
             │    │   │    └ [<Job (id=c85b7be44d804a9e84c7ee62b9a03a33 name=aws_manager)>, 'default', [datetime.datetime(2021, 7, 6, 7, 12, 40, 663998, t...
             │    │   └ <concurrent.futures.thread._WorkItem object at 0x7f0a3cf6df70>
             │    └ <function run_job at 0x7f0a41231550>
             └ <concurrent.futures.thread._WorkItem object at 0x7f0a3cf6df70>
> File "/usr/local/lib/python3.8/site-packages/apscheduler/executors/base.py", line 125, in run_job
    retval = job.func(*job.args, **job.kwargs)
             │   │     │   │       │   └ <member 'kwargs' of 'Job' objects>
             │   │     │   │       └ <Job (id=c85b7be44d804a9e84c7ee62b9a03a33 name=aws_manager)>
             │   │     │   └ <member 'args' of 'Job' objects>
             │   │     └ <Job (id=c85b7be44d804a9e84c7ee62b9a03a33 name=aws_manager)>
             │   └ <member 'func' of 'Job' objects>
             └ <Job (id=c85b7be44d804a9e84c7ee62b9a03a33 name=aws_manager)>

  File "/app/cloudproxy/providers/manager.py", line 16, in aws_manager
    ip_list = aws_start()
              └ <function aws_start at 0x7f0a40163820>

  File "/app/cloudproxy/providers/aws/main.py", line 119, in aws_start
    aws_deployment(config["providers"]["aws"]["scaling"]["min_scaling"])
    │              └ {'auth': {'username': 'Cloudproxy', 'password': 'Cloudproxy'}, 'age_limit': 0, 'providers': {'digitalocean': {'enabled': Fals...
    └ <function aws_deployment at 0x7f0a41121160>

  File "/app/cloudproxy/providers/aws/main.py", line 37, in aws_deployment
    create_proxy()
    └ <function create_proxy at 0x7f0a401633a0>

  File "/app/cloudproxy/providers/aws/functions.py", line 85, in create_proxy
    instance = ec2.create_instances(
               │   └ <function ResourceFactory._create_action.<locals>.do_action at 0x7f0a4025faf0>
               └ ec2.ServiceResource()

  File "/usr/local/lib/python3.8/site-packages/boto3/resources/factory.py", line 520, in do_action
    response = action(self, *args, **kwargs)
               │      │      │       └ {'ImageId': 'ami-096cb92bb3580c759', 'MinCount': 1, 'MaxCount': 1, 'InstanceType': 't2.micro', 'NetworkInterfaces': [{'Device...
               │      │      └ ()
               │      └ ec2.ServiceResource()
               └ <boto3.resources.action.ServiceAction object at 0x7f0a4021d640>
  File "/usr/local/lib/python3.8/site-packages/boto3/resources/action.py", line 83, in __call__
    response = getattr(parent.meta.client, operation_name)(*args, **params)
                       │      │    │       │                │       └ {'ImageId': 'ami-096cb92bb3580c759', 'MinCount': 1, 'MaxCount': 1, 'InstanceType': 't2.micro', 'NetworkInterfaces': [{'Device...
                       │      │    │       │                └ ()
                       │      │    │       └ 'run_instances'
                       │      │    └ <botocore.client.EC2 object at 0x7f0a402634c0>
                       │      └ ResourceMeta('ec2', identifiers=[])
                       └ ec2.ServiceResource()
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
           │    │              │               └ {'ImageId': 'ami-096cb92bb3580c759', 'MinCount': 1, 'MaxCount': 1, 'InstanceType': 't2.micro', 'NetworkInterfaces': [{'Device...
           │    │              └ 'RunInstances'
           │    └ <function BaseClient._make_api_call at 0x7f0a40a9da60>
           └ <botocore.client.EC2 object at 0x7f0a402634c0>
  File "/usr/local/lib/python3.8/site-packages/botocore/client.py", line 705, in _make_api_call
    raise error_class(parsed_response, operation_name)
          │           │                └ 'RunInstances'
          │           └ {'Error': {'Code': 'VcpuLimitExceeded', 'Message': 'You have requested more vCPU capacity than your current vCPU limit of 8 a...
          └ <class 'botocore.exceptions.ClientError'>

botocore.exceptions.ClientError: An error occurred (VcpuLimitExceeded) when calling the RunInstances operation: You have requested more vCPU capacity than your current vCPU limit of 8 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.
2021-07-06 07:12:47.218 | INFO     | uvicorn.protocols.http.h11_impl:send:461 - 172.17.0.1:52996 - "GET /providers HTTP/1.1" 200
2021-07-06 07:12:47.219 | INFO     | uvicorn.protocols.http.h11_impl:send:461 - 172.17.0.1:53000 - "GET /destroy HTTP/1.1" 200
claffin commented 3 years ago

This issue is with your AWS account. The error message actually tells you what the issue is. You've hit a limit of vCPU limit for your account basically. Each proxy will consume 1 vCPU. You can request an uplift at http://aws.amazon.com/contact-us/ec2-request. You can read more here https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-on-demand-instances.html#vcpu-limits-request-increase

chalitbkb commented 3 years ago

This issue is with your AWS account. The error message actually tells you what the issue is. You've hit a limit of vCPU limit for your account basically. Each proxy will consume 1 vCPU. You can request an uplift at http://aws.amazon.com/contact-us/ec2-request. You can read more here https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-on-demand-instances.html#vcpu-limits-request-increase

What is the approximate maximum vCPU limit that can be requested and is there any additional cost?

*If I want to use Proxy in bulk, do you have any suggestions for a better alternative? (I need a number of multiple proxy)

claffin commented 3 years ago

With AWS you only pay for what you use, so increasing the limit should have no cost. There is only a cost when you are actually running the proxies.

Cloudproxy supports DigitalOcean and Hetzner which are both cheaper than AWS (on average). Using multiple providers at once will allow you to provision more proxies.

claffin commented 3 years ago

Issue seems unrelated to cloudproxy and resolved, will close.