dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Distributed Client Times Out Restarting IPv6 Workers #2741

Closed higherorderfunctor closed 5 years ago

higherorderfunctor commented 5 years ago

The worker will successfully reboot, but the client reports a timeout condition. It is non-impacting as work continues after the timeout. It appears an exception occurs on the scheduler because the IPv6 address is not wrapped in square brackets.

Minimal viable example:

from dask.distributed import Client

client = Client('localhost:8786')
client.restart()

# dask-scheduler
# dask-worker '[::1]:8786'

Client output: distributed.client - ERROR - Restart timed out after 20.000000 seconds

Scheduler output:

Traceback (most recent call last):
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/scheduler.py", line 2615, in restart
    resps = yield gen.with_timeout(timedelta(seconds=timeout), resps)
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/utils.py", line 218, in All
    result = yield tasks.next()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/core.py", line 671, in send_recv_from_rpc
    comm = yield self.live_comm()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/core.py", line 639, in live_comm
    connection_args=self.connection_args,
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/comm/core.py", line 218, in connect
    quiet_exceptions=EnvironmentError,
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/comm/tcp.py", line 354, in connect
    ip, port = parse_host_port(address)
  File "/home/higherorderfunctor/ENV/lib/python3.7/site-packages/distributed/comm/addressing.py", line 94, in parse_host_port
    return host, int(port)
ValueError: invalid literal for int() with base 10: ':1:38519'
System Info:

``` NAME="Red Hat Enterprise Linux Server" VERSION="7.4 (Maipo)" ID="rhel" ID_LIKE="fedora" VARIANT="Server" VARIANT_ID="server" VERSION_ID="7.4" PRETTY_NAME="Red Hat Enterprise Linux Server 7.4 (Maipo)" ANSI_COLOR="0;31" CPE_NAME="cpe:/o:redhat:enterprise_linux:7.4:GA:server" HOME_URL="https://www.redhat.com/" BUG_REPORT_URL="https://bugzilla.redhat.com/" REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 7" REDHAT_BUGZILLA_PRODUCT_VERSION=7.4 REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux" REDHAT_SUPPORT_PRODUCT_VERSION="7.4" ```

Packages:

``` alabaster==0.7.12 astroid==2.2.5 attrs==19.1.0 Babel==2.7.0 backcall==0.1.0 bandit==1.6.0 bleach==3.1.0 bokeh==1.2.0 certifi==2019.3.9 chardet==3.0.4 Click==7.0 cloudpickle==1.1.1 colorlog==4.0.2 coreapi==2.3.3 coreschema==0.0.4 coverage==4.5.3 cycler==0.10.0 dask==1.2.2 decorator==4.4.0 defusedxml==0.6.0 dill==0.2.9 distributed==1.28.1 Django==2.2.1 django-cors-headers==3.0.2 django-debug-toolbar==1.11 django-filter==2.1.0 django-model-utils==3.1.2 django-netfields==0.10.0 django-postgres-extra==1.21a17 django-stubs==0.12.1 djangorestframework==3.9.4 djangorestframework-stubs==0.4.2 docutils==0.14 entrypoints==0.3 flake8==3.7.7 flake8-docstrings==1.3.0 flake8-isort==2.7.0 flake8-polyfill==1.0.2 fn==0.4.3 future==0.17.1 gitdb2==2.0.5 GitPython==2.1.11 HeapDict==1.0.0 idna==2.8 imagesize==1.1.0 ipykernel==5.1.1 ipython==7.5.0 ipython-genutils==0.2.0 isort==4.3.20 itypes==1.1.0 jedi==0.13.3 Jinja2==2.10.1 jsonschema==3.0.1 jupyter-client==5.2.4 jupyter-core==4.4.0 jupyterlab==0.35.6 jupyterlab-server==0.3.0 kiwisolver==1.1.0 lazy-object-proxy==1.4.1 Logbook==1.4.3 MarkupSafe==1.1.1 matplotlib==3.1.0 mccabe==0.6.1 mistune==0.8.4 mock==3.0.5 msgpack==0.6.1 multiprocess==0.70.7 mypy==0.701 mypy-extensions==0.4.1 mysqlclient==1.4.2.post1 nbconvert==5.5.0 nbformat==4.4.0 netaddr==0.7.19 networkx==2.3 notebook==5.7.8 numexpr==2.6.9 numpy==1.16.4 packaging==19.0 pandas==0.24.2 pandocfilters==1.4.2 parso==0.4.0 pathos==0.2.3 pbr==5.2.0 pexpect==4.7.0 pickleshare==0.7.5 Pillow==6.0.0 pipdeptree==0.13.2 ply==3.11 pox==0.2.5 ppft==1.6.4.9 prometheus-client==0.6.0 prompt-toolkit==2.0.9 psutil==5.6.2 psycopg2-binary==2.8.2 ptyprocess==0.6.0 pyasn1==0.4.5 pycodestyle==2.5.0 pycrypto==2.6.1 pycryptodomex==3.8.1 pydocstyle==3.0.0 pyflakes==2.1.1 PyFunctional==1.2.0 Pygments==2.4.2 pygraphviz==1.5 pylint==2.3.1 pylint-django==2.0.9 pylint-plugin-utils==0.5 pyparsing==2.4.0 pyrsistent==0.15.2 pysmi==0.3.4 pysnmp==4.4.9 python-dateutil==2.8.0 pytz==2019.1 PyYAML==5.1 pyzmq==18.0.1 requests==2.22.0 scipy==1.3.0 Send2Trash==1.5.0 six==1.12.0 smmap2==2.0.5 snmpsim==0.4.7 snowballstemmer==1.2.1 sortedcontainers==2.1.0 Sphinx==2.0.1 sphinx-rtd-theme==0.4.3 sphinxcontrib-applehelp==1.0.1 sphinxcontrib-devhelp==1.0.1 sphinxcontrib-htmlhelp==1.0.2 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==1.0.2 sphinxcontrib-serializinghtml==1.1.3 sphinxcontrib-websupport==1.1.2 sqlparse==0.3.0 stevedore==1.30.1 streamz==0.5.1 tables==3.5.1 tabulate==0.8.3 tblib==1.4.0 terminado==0.8.2 testfixtures==6.8.2 testpath==0.4.2 toolz==0.9.0 tornado==6.0.2 traitlets==4.3.2 typed-ast==1.3.5 typing-extensions==3.7.2 uritemplate==3.0.0 urllib3==1.25.3 wcwidth==0.1.7 webencodings==0.5.1 wrapt==1.11.1 zict==0.1.4 ```

jakirkham commented 5 years ago

@mrocklin, would you have a chance to look at this?

mrocklin commented 5 years ago

I just gave this a shot on my machine on master and didn't receive this error.

I didn't fully replicate the environment though.

@higherorderfunctor can you try on master? Alternatively, are you comfortable enough with the code to do some debugging to investigate what's happening within the function that causes the error? My hope is that the code within that function is failry accessible.

higherorderfunctor commented 5 years ago

I could not reproduce on master. I'm guessing that the incoming address parameter was lacking the brackets, so the if address.startswith("["): branch was not getting executed. So far as I can tell, this is resolved in master. Thank you for looking into it.

Master:

#  ~ mkdir dask-test
#  ~ cd dask-test
#  dask-test virtualenv -p python3.7 ENV
Already using interpreter /usr/local/bin/python3.7
Using base prefix '/usr/local'
New python executable in /home/higherorderfunctor/dask-test/ENV/bin/python3.7
Also creating executable in /home/higherorderfunctor/dask-test/ENV/bin/python
Installing setuptools, pip, wheel...
done.
#  dask-test source ENV/bin/activate
(ENV) #  dask-test pip install git+https://github.com/dask/distributed.git
Collecting git+https://github.com/dask/distributed.git
  Cloning https://github.com/dask/distributed.git to /tmp/pip-req-build-zg409dky
  Running command git clone -q https://github.com/dask/distributed.git /tmp/pip-req-build-zg409dky
Collecting click>=6.6 (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl
Collecting cloudpickle>=0.2.2 (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/24/fb/4f92f8c0f40a0d728b4f3d5ec5ff84353e705d8ff5e3e447620ea98b06bd/cloudpickle-1.1.1-py2.py3-none-any.whl
Collecting dask>=0.18.0 (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/58/80/6059587dcf80821fbccdc1f06b6d3fd29b64c9b38fbf1ea42889460de3de/dask-1.2.2-py2.py3-none-any.whl
Collecting msgpack (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/a8/7b/630049fc4af9e68a625738612edc264ce7cb586c5001a2d4d2209a4f61c1/msgpack-0.6.1-cp37-cp37m-manylinux1_x86_64.whl
Collecting psutil>=5.0 (from distributed==1.28.1+46.g861536c)
Collecting six (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Collecting sortedcontainers!=2.0.0,!=2.0.1 (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/13/f3/cf85f7c3a2dbd1a515d51e1f1676d971abe41bba6f4ab5443240d9a78e5b/sortedcontainers-2.1.0-py2.py3-none-any.whl
Collecting tblib (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/64/b5/ebb1af4d843047ccd7292b92f5e5f8643153e8b95d14508d9fe3b35f7004/tblib-1.4.0-py2.py3-none-any.whl
Collecting toolz>=0.7.4 (from distributed==1.28.1+46.g861536c)
Collecting tornado>=5 (from distributed==1.28.1+46.g861536c)
Collecting zict>=0.1.3 (from distributed==1.28.1+46.g861536c)
  Using cached https://files.pythonhosted.org/packages/bd/45/a2e6f95a850cd407d785f2f8624b02e72baf6ab910aea4bdcac7e18b4871/zict-0.1.4-py2.py3-none-any.whl
Collecting pyyaml (from distributed==1.28.1+46.g861536c)
Collecting heapdict (from zict>=0.1.3->distributed==1.28.1+46.g861536c)
Building wheels for collected packages: distributed
  Building wheel for distributed (setup.py) ... done
  Stored in directory: /tmp/pip-ephem-wheel-cache-p8o9bdhf/wheels/29/07/22/1ee64efb7cc30749c0db6c7462ac20b6708d7be38db8321d76
Successfully built distributed
Installing collected packages: click, cloudpickle, dask, msgpack, psutil, six, sortedcontainers, tblib, toolz, tornado, heapdict, zict, pyyaml, distributed
Successfully installed click-7.0 cloudpickle-1.1.1 dask-1.2.2 distributed-1.28.1+46.g861536c heapdict-1.0.0 msgpack-0.6.1 psutil-5.6.2 pyyaml-5.1 six-1.12.0 sortedcontainers-2.1.0 tblib-1.4.0 toolz-0.9.0 tornado-6.0.2 zict-0.1.4
(ENV) #  dask-test dask-scheduler &
[1] 1534
(ENV) #  dask-test distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at: tcp://192.168.24.20:8786
distributed.scheduler - INFO - Local Directory:    /tmp/scheduler-ijobc8rf
distributed.scheduler - INFO - -----------------------------------------------

(ENV) #  dask-test dask-worker '[::1]:8786' &
[2] 1747
(ENV) #  dask-test distributed.nanny - INFO -         Start Nanny at: 'tcp://[::1]:42831'
distributed.worker - INFO -       Start worker at:          tcp://[::1]:44386
distributed.worker - INFO -          Listening to:            tcp://::1:44386
distributed.worker - INFO - Waiting to connect to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         24
distributed.worker - INFO -                Memory:                   16.38 GB
distributed.worker - INFO -       Local Directory: /home/higherorderfunctor/dask-test/worker-f1y7r9yn
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://[::1]:44386
distributed.scheduler - INFO - Starting worker compute stream, tcp://[::1]:44386
distributed.core - INFO - Starting established connection
distributed.worker - INFO -         Registered to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
(ENV) #  dask-test python -c 'from dask.distributed import Client;client = Client("localhost:8786");client.restart()'
distributed.scheduler - INFO - Receive client connection: Client-ca623d74-8643-11e9-88a1-3ca82ae8f558
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Send lost future signal to clients
distributed.scheduler - INFO - Remove worker tcp://[::1]:44386
distributed.core - INFO - Removing comms to tcp://[::1]:44386
distributed.scheduler - INFO - Lost all workers
distributed.scheduler - INFO - Clear task state
distributed.worker - INFO - Stopping worker at tcp://[::1]:44386
distributed.worker - INFO -       Start worker at:          tcp://[::1]:37098
distributed.worker - INFO -          Listening to:            tcp://::1:37098
distributed.worker - INFO - Waiting to connect to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         24
distributed.worker - INFO -                Memory:                   16.38 GB
distributed.worker - INFO -       Local Directory: /home/higherorderfunctor/dask-test/worker-qkvs9eeh
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://[::1]:37098
distributed.scheduler - INFO - Starting worker compute stream, tcp://[::1]:37098
distributed.core - INFO - Starting established connection
distributed.worker - INFO -         Registered to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Remove client Client-ca623d74-8643-11e9-88a1-3ca82ae8f558
distributed.scheduler - INFO - Remove client Client-ca623d74-8643-11e9-88a1-3ca82ae8f558
distributed.scheduler - INFO - Close client connection: Client-ca623d74-8643-11e9-88a1-3ca82ae8f558

distributed-1.28.1 via pip:

#  ~ mkdir dask-test2
#  ~ cd dask-test2
#  dask-test2 virtualenv -p python3.7 ENV
Already using interpreter /usr/local/bin/python3.7
Using base prefix '/usr/local'
New python executable in /home/higherorderfunctor/dask-test2/ENV/bin/python3.7
Also creating executable in /home/higherorderfunctor/dask-test2/ENV/bin/python
Installing setuptools, pip, wheel...
done.
#  dask-test2 source ENV/bin/activate
(ENV) #  dask-test2 pip install distributed
Collecting distributed
  Using cached https://files.pythonhosted.org/packages/16/00/c2883c4e234b493b81521e5f825cad89004210ca928f3ca161a1dafd8598/distributed-1.28.1-py2.py3-none-any.whl
Collecting tornado>=5 (from distributed)
Collecting dask>=0.18.0 (from distributed)
  Using cached https://files.pythonhosted.org/packages/58/80/6059587dcf80821fbccdc1f06b6d3fd29b64c9b38fbf1ea42889460de3de/dask-1.2.2-py2.py3-none-any.whl
Collecting sortedcontainers!=2.0.0,!=2.0.1 (from distributed)
  Using cached https://files.pythonhosted.org/packages/13/f3/cf85f7c3a2dbd1a515d51e1f1676d971abe41bba6f4ab5443240d9a78e5b/sortedcontainers-2.1.0-py2.py3-none-any.whl
Collecting pyyaml (from distributed)
Collecting tblib (from distributed)
  Using cached https://files.pythonhosted.org/packages/64/b5/ebb1af4d843047ccd7292b92f5e5f8643153e8b95d14508d9fe3b35f7004/tblib-1.4.0-py2.py3-none-any.whl
Collecting toolz>=0.7.4 (from distributed)
Collecting click>=6.6 (from distributed)
  Using cached https://files.pythonhosted.org/packages/fa/37/45185cb5abbc30d7257104c434fe0b07e5a195a6847506c074527aa599ec/Click-7.0-py2.py3-none-any.whl
Collecting six (from distributed)
  Using cached https://files.pythonhosted.org/packages/73/fb/00a976f728d0d1fecfe898238ce23f502a721c0ac0ecfedb80e0d88c64e9/six-1.12.0-py2.py3-none-any.whl
Collecting zict>=0.1.3 (from distributed)
  Using cached https://files.pythonhosted.org/packages/bd/45/a2e6f95a850cd407d785f2f8624b02e72baf6ab910aea4bdcac7e18b4871/zict-0.1.4-py2.py3-none-any.whl
Collecting psutil>=5.0 (from distributed)
Collecting cloudpickle>=0.2.2 (from distributed)
  Using cached https://files.pythonhosted.org/packages/24/fb/4f92f8c0f40a0d728b4f3d5ec5ff84353e705d8ff5e3e447620ea98b06bd/cloudpickle-1.1.1-py2.py3-none-any.whl
Collecting msgpack (from distributed)
  Using cached https://files.pythonhosted.org/packages/a8/7b/630049fc4af9e68a625738612edc264ce7cb586c5001a2d4d2209a4f61c1/msgpack-0.6.1-cp37-cp37m-manylinux1_x86_64.whl
Collecting heapdict (from zict>=0.1.3->distributed)
Installing collected packages: tornado, dask, sortedcontainers, pyyaml, tblib, toolz, click, six, heapdict, zict, psutil, cloudpickle, msgpack, distributed
Successfully installed click-7.0 cloudpickle-1.1.1 dask-1.2.2 distributed-1.28.1 heapdict-1.0.0 msgpack-0.6.1 psutil-5.6.2 pyyaml-5.1 six-1.12.0 sortedcontainers-2.1.0 tblib-1.4.0 toolz-0.9.0 tornado-6.0.2 zict-0.1.4
(ENV) #  dask-test2 dask-scheduler &
[1] 3557
(ENV) #  dask-test2 distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Web dashboard not loaded.  Unable to import bokeh
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at: tcp://192.168.24.20:8786
distributed.scheduler - INFO - Local Directory:    /tmp/scheduler-c8ltmsz2
distributed.scheduler - INFO - -----------------------------------------------

(ENV) #  dask-test2 dask-worker '[::1]:8786' &
[2] 3603
(ENV) #  dask-test2 distributed.nanny - INFO -         Start Nanny at: 'tcp://[::1]:38035'
distributed.worker - INFO -       Start worker at:          tcp://[::1]:38299
distributed.worker - INFO -          Listening to:          tcp://[::1]:38299
distributed.worker - INFO -              nanny at:                [::1]:38035
distributed.worker - INFO - Waiting to connect to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO -               Threads:                         24
distributed.worker - INFO -                Memory:                   16.38 GB
distributed.worker - INFO -       Local Directory: /home/higherorderfunctor/dask-test2/worker-ogohutyo
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://[::1]:38299
distributed.scheduler - INFO - Starting worker compute stream, tcp://[::1]:38299
distributed.core - INFO - Starting established connection
distributed.worker - INFO -         Registered to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection

(ENV) #  dask-test2 python -c 'from dask.distributed import Client;client = Client("localhost:8786");client.restart()'
distributed.scheduler - INFO - Receive client connection: Client-30012a18-8644-11e9-8f07-3ca82ae8f558
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Send lost future signal to clients
distributed.scheduler - INFO - Remove worker tcp://[::1]:38299
distributed.core - INFO - Removing comms to tcp://[::1]:38299
distributed.scheduler - INFO - Lost all workers
distributed.scheduler - INFO - Clear task state
distributed.utils - ERROR - invalid literal for int() with base 10: ':1:38035'
Traceback (most recent call last):
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/utils.py", line 713, in log_errors
    yield
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/scheduler.py", line 2615, in restart
    resps = yield gen.with_timeout(timedelta(seconds=timeout), resps)
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/utils.py", line 218, in All
    result = yield tasks.next()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/core.py", line 671, in send_recv_from_rpc
    comm = yield self.live_comm()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/core.py", line 639, in live_comm
    connection_args=self.connection_args,
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/comm/core.py", line 218, in connect
    quiet_exceptions=EnvironmentError,
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/comm/tcp.py", line 354, in connect
    ip, port = parse_host_port(address)
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/comm/addressing.py", line 94, in parse_host_port
    return host, int(port)
ValueError: invalid literal for int() with base 10: ':1:38035'
Future exception was never retrieved
future: <Future finished exception=ValueError("invalid literal for int() with base 10: ':1:38035'")>
Traceback (most recent call last):
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/scheduler.py", line 2615, in restart
    resps = yield gen.with_timeout(timedelta(seconds=timeout), resps)
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/utils.py", line 218, in All
    result = yield tasks.next()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/core.py", line 671, in send_recv_from_rpc
    comm = yield self.live_comm()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/core.py", line 639, in live_comm
    connection_args=self.connection_args,
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 736, in run
    yielded = self.gen.throw(*exc_info)  # type: ignore
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/comm/core.py", line 218, in connect
    quiet_exceptions=EnvironmentError,
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 729, in run
    value = future.result()
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/comm/tcp.py", line 354, in connect
    ip, port = parse_host_port(address)
  File "/home/higherorderfunctor/dask-test2/ENV/lib/python3.7/site-packages/distributed/comm/addressing.py", line 94, in parse_host_port
    return host, int(port)
ValueError: invalid literal for int() with base 10: ':1:38035'
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register tcp://[::1]:38299
distributed.scheduler - INFO - Starting worker compute stream, tcp://[::1]:38299
distributed.core - INFO - Starting established connection
distributed.worker - INFO -         Registered to:           tcp://[::1]:8786
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
distributed.client - ERROR - Restart timed out after 20.000000 seconds
distributed.scheduler - INFO - Remove client Client-30012a18-8644-11e9-8f07-3ca82ae8f558
distributed.scheduler - INFO - Remove client Client-30012a18-8644-11e9-8f07-3ca82ae8f558
distributed.scheduler - INFO - Close client connection: Client-30012a18-8644-11e9-8f07-3ca82ae8f558
jakirkham commented 5 years ago

Thanks for checking this, @higherorderfunctor! 😄