The dask-scheduler uses dask.sizeof to manage memory for scaling and work distribution. Tasks which return a large payload can cause scheduling to stall, or may cause workers to exit. This can be better managed by giving the scheduler awareness of the payload sizes.
Expected Behavior
State.__sizeof__() should be defined to return the number of bytes stored in the object.
Reproduction
Define a task which returns a large numpy array. The dask-scheduler will report the managed-memory at 48 B.
@task
def memory_task():
x = np.random.randint(9, size=2*10**6)
return x
Proposed resolution: Currently State objects do not have __sizeof__ defined, which causes scheduler to think the payload is 48 B. Ideally, this __sizeof__ function should be implemented to capture the size of the result produced /stored in State.result.
Description
The dask-scheduler uses
dask.sizeof
to manage memory for scaling and work distribution. Tasks which return a large payload can cause scheduling to stall, or may cause workers to exit. This can be better managed by giving the scheduler awareness of the payload sizes.Expected Behavior
State.__sizeof__()
should be defined to return the number of bytes stored in the object.Reproduction
Define a task which returns a large numpy array. The dask-scheduler will report the managed-memory at 48 B.
Proposed resolution: Currently State objects do not have
__sizeof__
defined, which causes scheduler to think the payload is 48 B. Ideally, this__sizeof__
function should be implemented to capture the size of the result produced /stored inState.result
.