dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 718 forks source link

Add helpers to make scheduler state JSON serializable #4126

Open TomAugspurger opened 4 years ago

TomAugspurger commented 4 years ago

In a recent debugging session, I was trying to inspect a remote scheduler through a bunch of client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.<attr>) commands. Some of these (like .tasks) were harder since they contain references to non-serializable objects.

WorkerState.identity() is I think what I have in mind. This would be similar for all the objects in the scheduler state, and would recursively apply to values in the result.

>>> tasks = client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.tasks)
distributed.protocol.core - CRITICAL - Failed to deserialize
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 151, in loads
    value = _deserialize(head, fs, deserializers=deserializers)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 335, in deserialize
    return loads(header, frames)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 71, in pickle_loads
    return pickle.loads(x, buffers=buffers)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 75, in loads
    return pickle.loads(x)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/scheduler.py", line 301, in __hash__
    return hash(self.address)
AttributeError: address
mrocklin commented 4 years ago

Personally I don't have any objection to this, but I also don't often find a need for it. I'm ambivalent.

On Thu, Sep 24, 2020 at 2:17 PM Tom Augspurger notifications@github.com wrote:

In a recent debugging session, I was trying to inspect a remote scheduler through a bunch of client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.) commands. Some of these (like .tasks) were harder since they contain references to non-serializable objects.

WorkerState.identity() is I think what I have in mind. This would be similar for all the objects in the scheduler state, and would recursively apply to values in the result.

tasks = client.run_on_scheduler(lambda dask_scheduler: dask_scheduler.tasks) distributed.protocol.core - CRITICAL - Failed to deserialize Traceback (most recent call last): File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/core.py", line 151, in loads value = _deserialize(head, fs, deserializers=deserializers) File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 335, in deserialize return loads(header, frames) File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/serialize.py", line 71, in pickle_loads return pickle.loads(x, buffers=buffers) File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/protocol/pickle.py", line 75, in loads return pickle.loads(x) File "/srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/scheduler.py", line 301, in hash return hash(self.address)AttributeError: address

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/4126, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCYBD4APKT6SIE3P3TSHOZP3ANCNFSM4RYZB4JQ .

fjetter commented 3 years ago

xref https://github.com/dask/distributed/issues/5068