Open samaust opened 4 years ago
That's correct. Calling dask.config.set
only affects the local process, and not subprocesses. That might be a reasonable change to make that we could consider. In the meantime I recommend using a yaml file in your .config/dask/
directory if that is accessible to you.
That's correct. Calling
dask.config.set
only affects the local process, and not subprocesses.
In that case, it's not a bug but it's the intended behaviour by design. I think it's not obvious for new users and it would help to document this in the documentation. I suggest to add that information in the dask documentation in the section I linked to previously or explain it in the distributed documentation and link to it in the dask documentation. I'm not sure what the exact wording should be like.
@samaust you should be able to control this with dask.config.set
if this is performed before creating the cluster
@samaust @mrocklin
FYI this caught me recently as well. I spent this morning trying to figure out why my dask-jobqueue LSFCluster workers were being killed with:
OSError: Timed out during handshake while connecting to tcp://10.36.110.11:38453 after 10 s
After setting:
dask.config.set(
{'distributed.comm.timeouts.connect':'60s',
'distributed.comm.timeouts.tcp':'120',}
)
from the scheduler process.
I thought the config.set
function was broken. IMHO this should be mentioned in the documentation or even better have subprocesses inherit configuration changes from parents.
I believe this has been closed with https://github.com/dask/distributed/pull/4378 assuming you're using a Nanny
/ default config.
Note, that you should set the configuration before you start the processes / local cluster. Propagating config changes once the cluster is up is something we do not support at the moment
import dask
from dask.distributed import Client, LocalCluster
new = {"distributed.worker.memory.target": 0.1,
"distributed.worker.memory.spill": 0.2,
"distributed.worker.memory.pause": 0.3}
def get_config(dask_worker):
return {
"distributed.worker.memory.target": dask_worker.memory_target_fraction,
"distributed.worker.memory.spill": dask_worker.memory_spill_fraction,
"distributed.worker.memory.pause": dask_worker.memory_pause_fraction,
}
with dask.config.set(new):
cluster = LocalCluster()
client = Client(cluster)
results = client.run(get_config)
print(results)
for _, worker_config in results.items():
assert worker_config == new
After setting ... from the scheduler process
As Matt pointed out in https://github.com/dask/distributed/issues/3882#issuecomment-642338414, for clusters which span across multiple machines we recommend using a yaml configuration file in your .config/dask/
directory if that is accessible to you.
I'm not sure if dask-jobqueue
supports a built-in way to forward configuration files to nodes in the cluster. Perhaps @andersy005 knows what the recommended best practices are for dask-jobqueue
specifically
Late to the party, but this work for me
from dask.distributed import Client
import dask.distributed
#print(dask.config.config)
dask.config.set({'distributed.deploy.lost-worker-timeout': '10ms'})
print(dask.config.get('distributed.deploy.lost-worker-timeout'))
client = Client('<local_ip>:<port>')
Output:
10ms
The configuration directly within Python is explained in the documentation here : Configuration - Directly within Python
When using
dask.config.set
, I expect the worker to use those values. Instead, the worker reads the default values and does not use the values set usingdask.config.set
.I modified distributed\worker.py as below to print the values received by the worker.
Outputs
Notice the 0.7 value which is the default.
Passing the configuration by kwargs works.
Outputs
Environment: