dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.57k stars 717 forks source link

dask.distributed Client freezes after recent software upgrade. #1867

Closed jakirkham closed 6 years ago

jakirkham commented 6 years ago

@ebo commented on Mon Mar 12 2018

After a recent forced upgrade, nothing that uses dask.distributed now freezes and refuses to work. I am able to replicate this with the following code from pangeo/notebooks/newmann_ensemble_meteorology.ipynb


from dask.distributed import Client client = Client(scheduler_file='/glade/scratch/jhamman/scheduler.json') client

I notice that I get messages:

Adapting to protocol v5.1 for kernel 48f87baf-4f25-418a-9eb1-cd7ed4d7fc91

and following the net I have tried installing ipykernel jupyter_client ipyparallel, and changing my port from 8888 to 8889 and 9012. I have also tried re-initializing jupyer_notebook_config.py, rebooting the machine, and I cannot remember what all. Has there been a recent change (since 4.4, or maybe I had 5.0 installed, still trying to figure that out). Anyway, this morning everything worked fine until I installed pangeo and xarray. Now dask.distributed is broken.

Any suggestions?

I should also note that after everything went sideways I moved my old install to the side and did a full clean install from the most recent anaconda version. So I am currently working from a clean pangeo environment.yml install.

jakirkham commented 6 years ago

Copied the issue over here as it sounds like a Distributed issue not a Dask issue. Hope that is ok.

mrocklin commented 6 years ago

The protocol message is from Jupyter. I don't know what's happening. It could be one of a hundred things about a software environments.

So, no help from me without more information.

jakirkham commented 6 years ago

Tornado 5.0 made some rather big changes under the hood. In particular, they now use asyncio on Python 3. So a lot of libraries have been trying to adjust to this. Have personally seen some effects of this in my own code where asyncio is not used. So this is a possibility to be aware of.

AFAIK, based on personal experience, recent versions of distributed work reasonably well with tornado 5. Same story with ipykernel and jupyter_client. Though given the time this was posted, you may have been affected by this jupyter_client issue ( https://github.com/jupyter/jupyter_client/pull/352 ). So maybe just upgrading helps. It looks like there's been some effort to handle tornado 5 in ipyparallel, but haven't used it recently. So don't know if it would have issues.

Something else to keep in mind as you explore this further, would note conda provides revision history, which is helpful for seeing what changed when and reverting back to old configurations. This should help find a working install.

Also including info about the minimal set of packages required to create an issue with some example would expect to be extremely informative for others here.

ebo commented 6 years ago

I think that this may have been a Red Herring. Once I updated everything, rebooted, and hacked the example to remove the scheduler file from the example it then seemed to work. I think it was a combination of live-updating my linux box without a reboot and the particular example. Once I get a couple of deliverable out the door I will work on making a modified example that works out of the box and try to replicate the problem again (with detailed instructions).