Open jcrist opened 8 years ago
Some thoughts:
workers=
in restrictions
all the time without ever translating down until we absolutely need to in decide_worker
. This would make decide_worker
slightly more complex because it would have to take in some function that took a list of workers=
terms and produces a list of actual addresses. This function exists and is currently in Scheduler.worker_list
(which would be expanded). This would have the benefit of restricting logic to only one place and leaving this decision until the very end. It would also avoid having to add new {key: {environment_names}}
state, which is great, because state requires maintenance. There are a few places where worker_list
could be used but isn't, for example in broadcast
.Scheduler.handlers
) and in Scheduler.add_worker
. It would be nice to make the worker function idempotent so that the scheduler doesn't need to track which environments the worker has seen already and can just send the whole batch down without caring. (Idempotence is a core virtue in this project)worker_client.py:get_worker
, which should probably be moved to worker.py
. I don't think we necessarily need this though unless we need to support setup/teardown, which I would caution against for a first go around.condition
, or satisfies
but there are probably better things out there as well.Hrm, people like @ogrisel might have a use for setup(). He often deploys software with something like the following:
def install():
import os
os.system('pip install x y z')
client.run(install)
This would be a good and use case for setup, to always install certain libraries when the workers come online.
Since the condition
or setup
methods may take a while to execute on the workers, would you expect client.register('env', environment)
to block, or return a Future
?
There are two options:
Users don't need to wait on the action to finish. We just send the appropriate message up to the scheduler in a fire-and-forget manner. This is the case when we submit tasks like map
or submit
. We use the _send_to_scheduler
method on the client.
Users do want to wait on the action to finish. This is the case for operations like gather. We make two methods in this case, _gather
, which is a tornado coroutine (and so technically returns a tornado Future), and gather
, which blocks on that coroutine finishing.
Did we make any progress with this issue?
No, I don't think so.
On Wed, Aug 14, 2019 at 7:48 PM kindjacket notifications@github.com wrote:
Did we make any progress with this issue?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/495?email_source=notifications&email_token=AAKAOIWUN3RM3A65VVMAVRDQESRUDA5CNFSM4CO5QBA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4KQPYI#issuecomment-521471969, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOISPF3UPVT3KUNVMOX3QESRUDANCNFSM4CO5QBAQ .
This is a concrete proposal for solving #85. For preliminary discussion, see that issue.
What is an environment:
An environment is defined as an object of type
Environment
, which currently has the following design:The reason for making it a class is grouping the methods together, providing a consistent place to store state, and making the user facing signature consistent (even if more optional methods are added to the class). Using the examples from the original issue, example
Environment
classes might be:How environments are registered
Environments are registered by passing instances of the object to
Executor.register_environment
, along with a name. The decision to use instances instead of classes was made to allow environments to be parametrized (if needed), and contain internal state. An example of registering environments is:Changes to the scheduler/worker state and transitions
When an environment is registered
{name: environment_object}
calledenvironments
.isinstance
method is ran to check if the worker is an instance of the environment. If so, the worker stores it in a mapping of{name: environment_object}
called environments. The (optional)setup
method on the environment is then run, which can initialize any additional needed state.{environment_name: {worker_ids}}
to reflect this. Call thisenvironment_workers
.When a worker is added
When a worker is removed
environment_workers
as neededWhen a task is submitted
workers
keyword tosubmit
will be modified to take in environment names as well. This will be sent along in addition torestrictions
andloose_restrictions
to the scheduler'supdate_graph
method as anenvironments
keyword.update_graph
will update a mapping of{key: {environment_names}}
calledenvironment_restrictions
on the state. The reason for not converting this immediately intorestrictions
is that the workers in the environment may change between task submission time and task run time (new workers may register for example).When selecting tasks to run
ensure_occupied
will be modified to also take into accountenvironment_restrictions
when picking tasks to run and where to run them.Summary of new state
New worker state
environments
: a mapping of{environment_name: environment_object}
New scheduler state
environments
: a mapping of{environment_name: environment_object}
environment_workers
: a mapping of{environment_name: {workers}}
environment_restrictions
: a mapping of{key: {environment_names}}
Accessing environments from inside a task
Tasks can call the
distributed.get_environment('name')
function to get the environment object from inside the worker. The implementation of this might look like:This allows tasks to access whatever state the environment may contain internally. For example, a user may do the following to pull data from a stored database connection: