botify-labs / simpleflow

Python library for dataflow programming.
https://botify-labs.github.com/simpleflow/
MIT License
68 stars 24 forks source link

Simpleflow workers: proxy mode #439

Open ybastide opened 1 month ago

ybastide commented 1 month ago

For @jblernout 🙂


Here's the original request:

In Simpleflow, each worker performs a poll for activity.

When N workers run in parallel on the same instance, up to N connections are made to SWF to wait for an activity. If too many instances run in parallel, the maximum number of connections to SWF may be reached, preventing the workers from getting activities.

As a workaround, the number of workers has been limited in each instance. This number could be higher, which would improve the efficiency of each instance, reducing costs.

A proxy could be introduced at the instance level to fix the issue. The proxy would poll for activity on SWF, and the workers would poll for activity on the proxy.

The proxy would implement a simple HTTP server with a single thread (no parallel requests), receive POST requests with the poll details, forward them to SWF, and return SWF responses as an HTTP response.

The worker HTTP client timeout should be set to the same timeout as the SWF client in the proxy (+ some ms for the HTTP request handling).

The worker CLI would be modified to receive the proxy settings:

simpleflow worker.start --domain TestDomain --task-list quickstart --proxy=http://127.0.0.1:4242

A new CLI could be introduced to start the proxy:

simpleflow proxy.start --port=4242

The proxy would have to be started before the workers.


In terms of security, the included proxy is HTTP, meaning that it only sees the CONNECT swf.us-east-1.amazonaws.com:443 HTTP/1.0 (for instance) in cleartext; the actual connection is encrypted. That's because an MITM version needs more setup to pass the CA CERT to boto 🙂.

In terms of code, I mostly

I'm defaulting to ::1 for localhost in this PR, but it can be changed.

ewjoachim commented 1 month ago

Can you spare a few words what you're doing here ? This seems extremely low level compared to the rest of this lib, and I'd like to double check in terms of security before we ship this, if possible.

ybastide commented 1 month ago

Can you spare a few words what you're doing here ? This seems extremely low level compared to the rest of this lib, and I'd like to double check in terms of security before we ship this, if possible.

Sure, thanks for the reminder! Updating the description to add context.

jblernout commented 1 month ago

But if it's a simple HTTP proxy that doesn't do MITM, does that mean it doesn't necessarily reduce the number of connections to SWF? And if, at some point in the future, you want to put a global proxy that decides which worker can poll, without MITM, you can't control which worker does what ?

ybastide commented 1 month ago

No. of simultaneous connections: doesn't the singlethreadness take care of this? global proxy: wouldn't the source address be enough?

ewjoachim commented 1 month ago

No. of simultaneous connections: doesn't the singlethreadness take care of this?

In that case why are we doing a proxy and not just a file-based mutex ? Also, is that going to impact performance ? (how much of a worker's time is spent waiting for the SWF API ? If it's significant, then serializing the API calls might make the worker slower, especially if the aim is to have more workers per container.

I think a file-based mutex is probably enough if we can accept the performance cost. If we want to significantly reduce the number of connections (and potentially speed up simpleflow by avoiding connection time), we should 1/ ensure that we reuse connections between calls of the same process (we probably already do that, but it might be worth double-checking) and 2/ it might be interesting to imagine a kind of proxy like you do but with connection sharing (or a pool). I could imagine multiple ways of doing that, from very low level to high level. It might be worth discussing (with me or any other interested folk)

ybastide commented 1 month ago

In that case why are we doing a proxy and not just a file-based mutex?

Because it looked like the easiest solution 🙂

[Snip many worthwhile points] It might be worth discussing (with me or any other interested folk)

Let's discuss this when you're back; thanks!

ewjoachim commented 3 weeks ago

(I'm back, btw)