cooperative-computing-lab / cctools

The Cooperative Computing Tools (cctools) enable large scale distributed computations to harness hundreds to thousands of machines from clusters, clouds, and grids.
http://ccl.cse.nd.edu
Other
134 stars 118 forks source link

Vine: Serverless Resource Management #3440

Closed dthain closed 1 year ago

dthain commented 1 year ago

(Note this is relative to PR #3436 that hasn't been merged yet.)

As written the serverless resource management model is a bit confused.

I believe that this is the current state of things, but I may be wrong. 1 - The manager will dispatch no more than one instance of a LibraryTask to a worker. This will cause the worker to have the "feature" indicating the library name. 2 - The manager will dispatch FunctionTasks to a worker with a matching feature name, until the resources declared by those tasks fill up the resources of the worker.

We need to work up a clear statement of how resources are declared and consumed in severless mode, and then implement things to match that statement. Here are the sorts of questions that need to be answered:

btovar commented 1 year ago

I'd say that both library and functions consume resources from the worker, as having the extra step of now managing resources for the library sounds ugly.

However, both sets of resources (but maybe not at the same time) may zeros. E.g., we are more likely to assign memory to the library, and cores to the functions.

Having one function per library seems easier. For concurrency is easier to fork the library than to manage many functions inside a library (harder for users to do).

dthain commented 1 year ago

At this point I'm more concerned about getting the abstraction "right" rather than making it easy to implement. We need a clear and simple statement of how things fit together.

For example, when we just consider plain tasks, the abstraction is simple:

How do we extend that description to Libraries and Functions?

dthain commented 1 year ago

3436 has been merged so this now describes the master branch.

dthain commented 1 year ago

FYI the worker now tracks the match between the function calls and libraries, so as to respect invocation limits in vine_process.h

The current limit is hard coded at one to match the implementation.

dthain commented 1 year ago

So after a few conversations, I think we are getting confused between how it happens to work today and how it ought to work. So, I'm going to stipulate a model to get us moving forward. Here is what we are going to do: 1 - Library Tasks are allocated resources in the same way as normal tasks. So, they can be explicitly labelled, or consume the entire worker (it nothing stated) or estimate via categories. 2 - Each Library Task states the maximum number of "function slots" that it can execute concurrently. 3 - Each Function call task consumes one slot in the corresponding library.

dthain commented 1 year ago

A couple of reasons for doing it this way:

dthain commented 1 year ago

3479 implements proper counting of function calls against the matching libraries.

Next step is to remove resource consumption by function calls, so all they do is line up with the libraries.

Is this as simple as just doing t.set_cores(0) etc for all resources on function call tasks?

dthain commented 1 year ago

3486 sets function call resources to zero.