claffin / cloudproxy

Hide your scrapers IP behind the cloud. Provision proxy servers across different cloud providers to improve your scraping success.
https://cloudproxy.io/
MIT License
1.4k stars 79 forks source link

Ghost proxies when destroying using spot in AWS #42

Closed xanrag closed 3 years ago

xanrag commented 3 years ago

Expected Behavior

The proxies are destroyed and remain gone.

Actual Behavior

The proxies are destroyed, then some unknown time later are restarted but without the cloudproxy tag since they are started by something other than cloudproxy. They are fully functional proxies though, just missing the tag.

Steps to Reproduce the Problem

  1. Start cloudproxy
  2. Increase servers to 30, wait.
  3. Decrease servers to 5, wait.

Specifications

Solution

My guess is that when you destroy the instances you also have to remove the spot request somehow, but I don't quite understand why.

claffin commented 3 years ago

Sounds like we're missing some logic around the handling of the spot requests.

From some brief research of spot instances, sounds like cloudproxy needs be aware of the spot request state. If it is open, it needs to treat it as a pending instance. I suspect cloudproxy may not be aware and keep firing requests (maybe?).

Also, persistent spot requests probably shouldn't be used. Cloudproxy has no awareness of persistent requests and will try to launch a new spot instance, if an instance is interrupted by AWS, and persistent requests will cause AWS to also relaunch the instance. This relaunch by the persistent request must be part of the problem?

Curious, are you seeing this issue for both persistent spot instances and one time instances? Or just persistent.

I will do some testing in the next couple of days.

xanrag commented 3 years ago

I have only been using persistent requests, but as you say that is probably the problem. If you use one-time instances you can't stop and restart the instance to get a new IP quickly.

You'd think that AWS would understand that once all the instances pertaining to a spot request are terminated that the spot request should be cancelled, but it doesn't. My guess is that we just need to send a further spot cancel command or maybe just the spot cancel now that I think of it. testing Yeah, cancelling a spot request with an active instance terminates the instance directly.

And yes, it probably doesn't handle AWS interruptions correctly, but I was running with 30 instances so it didn't really matter if a few fell off since cloudproxy just sent new spot requests.