Scale functions - Githubissues

bfirsh commented 7 years ago

The problem

The readme currently says that Funker "scales effortlessly", which is a bit of an exaggeration. In that, it doesn't. Yet.

A running instance of a function can handle one function call. It then refuses any other connections and shuts down when it has finished being called.

To be able to do better than serial processing, we need to create more than one replica of the service.

Potential solutions

Some ideas have been thrown around, but a starting point could be to simply to detect how many function are idle, and if that is getting low, boot up some more. If there are too many, scale down. This might not work if functions are very quick and take a while to restart, but it's probably worth a try.

It would theoretically be possible to scale a function down to nothing and have it cold boot on calling if the caller could somehow indicate that it needed running. Perhaps with a custom DNS server? Some intermediary service?

For all this stuff, I would prefer to err on the side of simplicity and fewer running components, since the whole point is that we're leaning on Docker's service infrastructure to make this work.

/cc @justincormack

deitch commented 7 years ago

Ideally, this should:

Receive request
Launch a function container to process it
Process
Terminate container upon completion of single request

In practice, you do have cold-boot problems. These normally are due not to container startup time but app infra start time (Python initialization, Nodejs initialization, etc.).

To the best of my knowledge, Amazon solves them by having some pre-warmed containers ready to run.

I do think, though, that if we can get it really running and scaling on-demand, we can solve the cold boot problem separately. Every request would look like this:

Receive request
Look for idle container matching type
- If found, send request to container
- If not, launch container, then send
Upon completion, terminate container

Follow that process, and you don't worry about pre-warming; it is a separate service.

However, you still will need some orchestration infrastructure to manage those containers that is not currently in Docker. I could do it with k8s services, not sure about swarm. Maybe if I run traefik on top of it? Still need to think that one through.

As for tracking which ones are running? Consul all the way. Every container with a function self-registers in Consul, which in turn is checked by your "request router" or traefik or whatever. There are some particular challenges to getting it right, particularly around node death and container monitoring, but these are doable if you do Consul right.

topiaruss commented 7 years ago

I use Apache Storm, underneath the python based streamparse. Funker looked immediately attractive to me. I could imagine it replacing the fairly complex and heavyweight storm with something that's much more dynamic. In storm I can have a graph of nodes invoked by one or more events and in turn invoking other nodes through new events. Recursion is very useful. Can you imagine the first (root?) funker call starting a router node. The router would instantiate new instances, act as a dynamic registry of known images, direct traffic to container instances, then possibly step out of the way directing return values to the original caller. Or if the router stays in the circuit, it would be well placed to gather stats. Each new root would crest its own router for the lifetime of the original task, and maintain its own registry and stats. Using consul might be attractive but ideally consul would not be an external dependency. As an optimisation a completed task/node could reinitialise and wait around for a few seconds after completion in case it's needed again. If consul were used, a node could also de register itself several seconds before terminating in order to drain any inflight traffic directed towards it.

deitch commented 7 years ago

I think I still would keep the original router at a different layer. Users shouldn't need to think about routers, only about event:function maps. If you really want this to work, make it dead simple:

"Dear user, please create really small containers that do functions and events that kick them off, and that is it".

The more I think about it, though, the more I realize that having the container call consul, register/deregstser, is lovely for container pilot, but when you want to keep it dead-simple, you cannot have the container require it. That is why Lambda only gives you the function, not the whole container.

I think Consul (or similar) is the right way to keep track of available containers - they really have solved the big problems - but the registration has to happen outside the container. I think registrator would do that pretty well.

justincormack commented 7 years ago

@topiaruss reinitialising is definitely an option. Somewhat concerned that the container might get polluted with left over state but it is pretty easy to implement, just a process that restarts the function when it stops. If you are careful and have a read only rootfs in the container though it is easier to clean up anything fairly safely.

bfirsh / funker

Scale functions #4

The problem

Potential solutions