dthain commented 1 year ago

(Note this is relative to PR #3436 that hasn't been merged yet.)

As written the serverless resource management model is a bit confused.

I believe that this is the current state of things, but I may be wrong. 1 - The manager will dispatch no more than one instance of a LibraryTask to a worker. This will cause the worker to have the "feature" indicating the library name. 2 - The manager will dispatch FunctionTasks to a worker with a matching feature name, until the resources declared by those tasks fill up the resources of the worker.

We need to work up a clear statement of how resources are declared and consumed in severless mode, and then implement things to match that statement. Here are the sorts of questions that need to be answered:

Does a LibraryTask consume resources from the worker in the usual way, or is it considered "de minimus"?
Does a FunctionTask consume resources from the worker or from the library to which it is matched?
Can a LibraryTask run multiple FunctionTasks concurrently? (If so, it must have an asynchronous interface to the worker)

btovar commented 1 year ago

I'd say that both library and functions consume resources from the worker, as having the extra step of now managing resources for the library sounds ugly.

However, both sets of resources (but maybe not at the same time) may zeros. E.g., we are more likely to assign memory to the library, and cores to the functions.

Having one function per library seems easier. For concurrency is easier to fork the library than to manage many functions inside a library (harder for users to do).

dthain commented 1 year ago

At this point I'm more concerned about getting the abstraction "right" rather than making it easy to implement. We need a clear and simple statement of how things fit together.

For example, when we just consider plain tasks, the abstraction is simple:

Each worker provides certain resources (cores, memory, disk)
Each task consumes a set of resources.
TaskVine will pack tasks into workers until no more fit.
If a task exceeds its assigned resources, the worker will send it back for rescheduling.

How do we extend that description to Libraries and Functions?

dthain commented 1 year ago

3436 has been merged so this now describes the master branch.

dthain commented 1 year ago

FYI the worker now tracks the match between the function calls and libraries, so as to respect invocation limits in vine_process.h

Each function process has a pointer to the library it was matched to.
Each library process has a count of the number of functions currently running.

The current limit is hard coded at one to match the implementation.

dthain commented 1 year ago

So after a few conversations, I think we are getting confused between how it happens to work today and how it ought to work. So, I'm going to stipulate a model to get us moving forward. Here is what we are going to do: 1 - Library Tasks are allocated resources in the same way as normal tasks. So, they can be explicitly labelled, or consume the entire worker (it nothing stated) or estimate via categories. 2 - Each Library Task states the maximum number of "function slots" that it can execute concurrently. 3 - Each Function call task consumes one slot in the corresponding library.

dthain commented 1 year ago

A couple of reasons for doing it this way:

Under this model, all resource allocation belongs to the library task. So if you want to change how things work, you only change the library task definition, and not the function task definition.
The implementation is easy b/c we just keep a counter associated with each running library task.
It accommodates several models of concurrency easily. If the library is simply single-threaded, then you get a one-to-one mapping. If the library is multi-process, then it simply declares more slots.

dthain commented 1 year ago

3479 implements proper counting of function calls against the matching libraries.

Next step is to remove resource consumption by function calls, so all they do is line up with the libraries.

Is this as simple as just doing t.set_cores(0) etc for all resources on function call tasks?

dthain commented 1 year ago

3486 sets function call resources to zero.

cooperative-computing-lab / cctools

Vine: Serverless Resource Management #3440

3436 has been merged so this now describes the master branch.

3479 implements proper counting of function calls against the matching libraries.

3486 sets function call resources to zero.