Open hendrikmakait opened 11 months ago
One possible solution for the SubprocessCluster
would be to write the file to a temporary file (e.g., /tmp/dask/<pid>
) and reading it from there.
TIL: There's the scheduler_file
which we should be able to leverage for this.
Curious if there is any known workaround for SSHCluster
?
I am trying to deploy a SSHCluster
using a JupyterLab environment and the output of the cell gets very verbose with many workers.
@jomey, if you are up for the challenge, I suppose you could use #8398 as a blueprint for the changes necessary to the SSHCluster
. IIRC, the code of the SSHCluster
looks very similar to what I fixed in that PR. (I'm currently out on PTO, so I won't be able to have a closer look at this for a few weeks.)
Taking the first and trying to set up my local machine (Ubuntu, 22.04 LTS). I have installed and configured a local ssh server that accepts key-less login. Then I setup the environment according to the test.yaml
With that, I can not get the current ssh tests to pass. All failures come back with the same message:
RuntimeError: Cluster failed to start: Worker failed to start
Any insights on how to set up a dev environment for this? (Permission issues?)
Maybe @jacobtomlinson or @jrbourbeau are able to help?
Found the reason for a few test failures. I had an older (forgotten) dask.yaml
under my user in .config/dask/
, which also set the log levels to critical
(exactly this issue, which I run into using a different machine). Renaming that entire config folder only leaves all the tests with old_
in them (3 total). Not sure if that is of concern.
This makes me wonder if there should be a new issue logged that tries to make the tests more resilient against such local user configs that are not part of the repository?
Both clusters rely on the scheduler address to be logged to
stderr
. The address is logged asINFO
, so setting the log level higher than that will cause the clusters to hang indefinitely.Related: #8392