Closed quartox closed 7 years ago
Initial suspicion is that the versions of distributed are incompatible; I would remove the temporary dask environment (normally in the knit source directory, and also in .knitDeps on HDFS) and also update dask/distributed in the environment from which you are launching knit.
What is distributed.__version__
on the worker nodes?
(I'm also unsure of what's happening here)
I have version 1.19.1 installed on the edge node, this is the create command that I am re-running right now /nas/isg_prodops_work/autowork/anaconda3/bin/conda create -p /home/jlord/.conda/envs/dask/lib/python3.6/site-packages/knit-0.2.2-py3.6.egg/knit/tmp_conda/envs/dask-35d2a1ee201208ae9fca6905fa88ea9e54557b58 -y -q dask>=0.14 distributed>=1.16
I can open it up to see the exact version when it finishes.
Yeah, that should be fine
On Fri, Oct 6, 2017 at 12:59 PM, Jesse Lord notifications@github.com wrote:
I have version 1.19.1 installed on the edge node, this is the create command that I am re-running right now /nas/isg_prodops_work/autowork/anaconda3/bin/conda create -p /home/jlord/.conda/envs/dask/lib/python3.6/site-packages/ knit-0.2.2-py3.6.egg/knit/tmp_conda/envs/dask- 35d2a1ee201208ae9fca6905fa88ea9e54557b58 -y -q dask>=0.14 distributed>=1.16
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/knit/issues/92#issuecomment-334812383, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszLIbgPOe5HIKfmjWfZ56Fx9f3OEgks5splx6gaJpZM4Pwv-l .
Looks like 1.18.1 in that conda env. Same error after rebuilding.
OK, thanks for checking
Hrm, can you verify that your client and scheduler are running the same version?
client.get_versions(check=True)
Note that you can pass channels to conda via the programmatic interface, or, of you run from the command line as above, with -c conda-forge
Should I pass the channel to the DaskYarnCluster
? I installed everything using conda-forge but it is building it automatically from a different channel.
yes, DaskYarnCluster(channels=['conda-forge'])
- you should have the installs from the same channels as far as possible.
That did it! I just needed to use the same channel.
As another note: you can provide an absolute path to a conda environment, or give a conda environment name that already exists, which may be easier in such situations.
What is the argument name for that path?
DaskYARNCluster(env='/my/conda/path')
(where that directory contains /bin, /lib, etc).
That can either be the .zip or a directory, which will then be zipped for you /my/conda/path => /my/conda/path.zip .
Excellent. I think we should definitely write up a troubleshooting guide at some point.
I believe the key error is here and it appears that everything is fine even though the
distributed/worker.py
module thinks the response is unexpected.Below is more of the container log. It continues restarting until it is killed. The final error when it is killed is at the bottom.
Final error: