Open darindf opened 6 years ago
First, let me apologize for the long delay here. This is an excellent issue. Thank you for taking the time to write it up well and diagnose the problem.
Yes, I believe that you're right. I think that ideally we would centralize the logic to handle the different cases for workers=
in the scheduler and use this consistently in the cases that it is used. Currently the best implementation is probably shared in Scheduler.update_graph
and is manipulates the restrictions
parameter, and the Scheduler.valid_workers
as it interprets those restrictions against the set of current workers.
Refactoring this so that it was usable both in its present form for submitting tasks, and also for scatter, would be quite welcome. I'll also try to address this at some point, but may not get around to it personally for some time.
Perhaps I could look at it, so would the logic for the submit, handling workers, be the best version to port?
The logic for this isn't actually in Client.submit or Client.scatter, instead its in scheduler.py in Scheduler.update_graph
and Scheduler.valid_workers
. This is because, in cases like Client.submit
where new workers might arrive between the client asking for something and the scheduler deciding it should happen. Scheduler.update_graph
handles worker restrictions here when Client.submit is called:
Then when it comes time to assess what workers fit the constraints it calls Scheduler.valid_workers
Scatter on the other hand is much simpler, as you've discovered:
So ideally we find a way to capture a lot of the logic in the first two sections of code, but without tying it to the scheduler state, so that scatter can just take the workers=
keyword it was given (which is more-or-less equivalent to the restrictions=
keyword passed to update_graph) and get the results immediately, without touching the Scheduler state like ts.host_restrictions
, etc..
The challenge here is refactoring this code to get the desired functionality without damaging other use cases.
It appears that workers parameter between the client submit and scatter methods are not consistent with each other. The submit worker can take ip address, hostnames, (and other other formats, like tcp:// prefix?). The scatter method requires a worker port to be added. The documentation states this needs to be a pair but I found that tcp://127.0.0.1:3454 syntax works as well.
Ideally the same format for defining workers should be the same across the client api.
Here is an example of the failure
The client scatter line fails with the error
I have tested this with using the dask-worker
--name mel
and changingworkers = ['mel']
and this works which contradicts the scatter documentaiton