Tests fail (unnecessarily) when networking is unavailable #4005

Open mgorny opened 4 years ago

mgorny commented 4 years ago

What happened:

Tests fail immediately when the system does not have Internet access:

=============================================================== test session starts ===============================================================
platform linux -- Python 3.6.11, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/.hypothesis/examples')
rootdir: /tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0, configfile: setup.cfg
plugins: mock-3.2.0, pyfakefs-4.1.0, shutil-1.7.0, virtualenv-1.7.0, timeout-1.4.1, freezegun-0.4.2, localserver-0.5.0, forked-1.3.0, xdist-1.34.0, asyncio-0.14.0, hypothesis-5.23.7
collecting ... collected 57 items / 1 error / 56 selected

===================================================================== ERRORS ======================================================================
______________________________________________ ERROR collecting distributed/comm/tests/test_comms.py ______________________________________________
/usr/lib64/python3.6/site-packages/toolz/functoolz.py:456: in memof
    return cache[k]
E   KeyError: ('2001:4860:4860::8888', 80)
        args       = ('2001:4860:4860::8888', 80)
        cache      = {('', 80): ''}
        func       = <function _get_ip at 0x7f5069419c80>
        k          = ('2001:4860:4860::8888', 80)
        key        = <function memoize.<locals>.key at 0x7f5069419d08>
        kwargs     = {'family': <AddressFamily.AF_INET6: 10>}

During handling of the above exception, another exception occurred:
distributed/utils.py:136: in _get_ip
    sock.connect((host, port))
E   OSError: [Errno 101] Network is unreachable
        family     = <AddressFamily.AF_INET6: 10>
        host       = '2001:4860:4860::8888'
        port       = 80
        sock       = <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_DGRAM, proto=0>

During handling of the above exception, another exception occurred:
distributed/comm/tests/test_comms.py:48: in <module>
    EXTERNAL_IP6 = get_ipv6()
        CommClosedError = <class 'distributed.comm.core.CommClosedError'>
        EXTERNAL_IP4 = ''
        Future     = <class '_asyncio.Future'>
        Serialized = <class 'distributed.protocol.serialize.Serialized'>
        __builtins__ = <builtins>
        __cached__ = '/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/comm/tests/__pycache__/test_comms.cpython-36.pyc'
        __doc__    = None
        __file__   = '/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/comm/tests/test_comms.py'
        __loader__ = <_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7f506a420320>
        __name__   = 'distributed.comm.tests.test_comms'
        __package__ = 'distributed.comm.tests'
        __spec__   = ModuleSpec(name='distributed.comm.tests.test_comms', loader=<_pytest.assertion.rewrite.AssertionRewritingHook object at 0x7f506a420320>, origin='/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/comm/tests/test_comms.py')
        asyncio    = <module 'asyncio' from '/usr/lib64/python3.6/asyncio/__init__.py'>
        backends   = {'inproc': <distributed.comm.inproc.InProcBackend object at 0x7f50693eeeb8>,
 'tcp': <distributed.comm.tcp.TCPBackend object at 0x7f5064020128>,
 'tls': <distributed.comm.tcp.TLSBackend object at 0x7f5064020160>,
 'ucx': <distributed.comm.ucx.UCXBackend object at 0x7f5064020ba8>}
        connect    = <function connect at 0x7f50694208c8>
        deserialize = <function deserialize at 0x7f50693cf268>
        distributed = <module 'distributed' from '/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/__init__.py'>
        get_address_host = <function get_address_host at 0x7f5069420598>
        get_backend = <function get_backend at 0x7f5069345620>
        get_cert   = <function get_cert at 0x7f5063916048>
        get_client_ssl_context = <function get_client_ssl_context at 0x7f5063916400>
        get_ip     = <function get_ip at 0x7f5069419e18>
        get_ipv6   = <function get_ipv6 at 0x7f5069419ea0>
        get_local_address_for = <function get_local_address_for at 0x7f5069420620>
        get_server_ssl_context = <function get_server_ssl_context at 0x7f5063916378>
        has_ipv6   = <function memoize.<locals>.memof at 0x7f5063913598>
        inproc     = <module 'distributed.comm.inproc' from '/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/comm/inproc.py'>
        ioloop     = <module 'tornado.ioloop' from '/usr/lib64/python3.6/site-packages/tornado/ioloop.py'>
        listen     = <function listen at 0x7f50694248c8>
        loop       = <function loop at 0x7f50638fbae8>
        os         = <module 'os' from '/usr/lib64/python3.6/os.py'>
        parse_address = <function parse_address at 0x7f5069345488>
        parse_host_port = <function parse_host_port at 0x7f5069420400>
        partial    = <class 'functools.partial'>
        pkg_resources = <module 'pkg_resources' from '/usr/lib64/python3.6/site-packages/pkg_resources/__init__.py'>
        pytest     = <module 'pytest' from '/usr/lib64/python3.6/site-packages/pytest/__init__.py'>
        requires_ipv6 = <function requires_ipv6 at 0x7f5063913620>
        resolve_address = <function resolve_address at 0x7f50694206a8>
        serialize  = <function serialize at 0x7f50693cf1e0>
        sys        = <module 'sys' (built-in)>
        tcp        = <module 'distributed.comm.tcp' from '/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/comm/tcp.py'>
        threading  = <module 'threading' from '/usr/lib64/python3.6/threading.py'>
        time       = <built-in function time>
        to_serialize = <class 'distributed.protocol.serialize.Serialize'>
        types      = <module 'types' from '/usr/lib64/python3.6/types.py'>
        unparse_host_port = <function unparse_host_port at 0x7f5069420488>
        warnings   = <module 'warnings' from '/usr/lib64/python3.6/warnings.py'>
distributed/utils.py:167: in get_ipv6
    return _get_ip(host, port, family=socket.AF_INET6)
        host       = '2001:4860:4860::8888'
        port       = 80
/usr/lib64/python3.6/site-packages/toolz/functoolz.py:460: in memof
    cache[k] = result = func(*args, **kwargs)
        args       = ('2001:4860:4860::8888', 80)
        cache      = {('', 80): ''}
        func       = <function _get_ip at 0x7f5069419c80>
        k          = ('2001:4860:4860::8888', 80)
        key        = <function memoize.<locals>.key at 0x7f5069419d08>
        kwargs     = {'family': <AddressFamily.AF_INET6: 10>}
distributed/utils.py:146: in _get_ip
    socket.gethostname(), port, family, socket.SOCK_DGRAM, socket.IPPROTO_UDP
        family     = <AddressFamily.AF_INET6: 10>
        host       = '2001:4860:4860::8888'
        port       = 80
        sock       = <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_DGRAM, proto=0>
/usr/lib64/python3.6/socket.py:745: in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
E   socket.gaierror: [Errno -5] No address associated with hostname
        addrlist   = []
        family     = <AddressFamily.AF_INET6: 10>
        flags      = 0
        host       = 'localhost'
        port       = 80
        proto      = 17
        type       = <SocketKind.SOCK_DGRAM: 2>
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================================================ 1 error in 1.99s =================================================================

What you expected to happen:

I expected at least subset of tests to be usable in network-constrained environments.

Anything else we need to know?:

It seems that replacing has_ipv6() with explicit False helps it get past initial error. However, the majority of tests still fail because of network failures, e.g.:

=============================================================== test session starts ===============================================================
platform linux -- Python 3.6.11, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/.hypothesis/examples')
rootdir: /tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0, configfile: setup.cfg
plugins: mock-3.2.0, pyfakefs-4.1.0, shutil-1.7.0, virtualenv-1.7.0, timeout-1.4.1, freezegun-0.4.2, localserver-0.5.0, forked-1.3.0, xdist-1.34.0, asyncio-0.14.0, hypothesis-5.23.7
collecting ... collected 1614 items / 4 deselected / 20 skipped / 1590 selected

distributed/cli/tests/test_dask_scheduler.py::test_defaults FAILED                                                                          [  0%]

==================================================================== FAILURES =====================================================================
__________________________________________________________________ test_defaults __________________________________________________________________

args = ('2001:4860:4860::8888', 80), kwargs = {'family': <AddressFamily.AF_INET6: 10>}, k = ('2001:4860:4860::8888', 80)

    def memof(*args, **kwargs):
        k = key(args, kwargs)
>           return cache[k]
E           KeyError: ('2001:4860:4860::8888', 80)

args       = ('2001:4860:4860::8888', 80)
cache      = {('', 80): ''}
func       = <function _get_ip at 0x7f0b63eddc80>
k          = ('2001:4860:4860::8888', 80)
key        = <function memoize.<locals>.key at 0x7f0b63eddd08>
kwargs     = {'family': <AddressFamily.AF_INET6: 10>}

/usr/lib64/python3.6/site-packages/toolz/functoolz.py:456: KeyError

During handling of the above exception, another exception occurred:

host = '2001:4860:4860::8888', port = 80, family = <AddressFamily.AF_INET6: 10>

    def _get_ip(host, port, family):
        # By using a UDP socket, we don't actually try to connect but
        # simply select the local address through which *host* is reachable.
        sock = socket.socket(family, socket.SOCK_DGRAM)
>           sock.connect((host, port))
E           OSError: [Errno 101] Network is unreachable

family     = <AddressFamily.AF_INET6: 10>
host       = '2001:4860:4860::8888'
port       = 80
sock       = <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_DGRAM, proto=0>

distributed/utils.py:136: OSError

During handling of the above exception, another exception occurred:

loop = <tornado.platform.asyncio.AsyncIOLoop object at 0x7f0b5236bda0>

    def test_defaults(loop):
        with popen(["dask-scheduler", "--no-dashboard"]) as proc:

            async def f():
                # Default behaviour is to listen on all addresses
                await assert_can_connect_from_everywhere_4_6(8786, timeout=5.0)

            with Client("" % Scheduler.default_port, loop=loop) as c:
>               c.sync(f)

c          = <Client: not connected>
f          = <function test_defaults.<locals>.f at 0x7f0b538a8d08>
loop       = <tornado.platform.asyncio.AsyncIOLoop object at 0x7f0b5236bda0>
proc       = <subprocess.Popen object at 0x7f0b5236b4e0>

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
distributed/client.py:833: in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
        args       = ()
        asynchronous = None
        callback_timeout = None
        func       = <function test_defaults.<locals>.f at 0x7f0b538a8d08>
        kwargs     = {}
        self       = <Client: not connected>
distributed/utils.py:339: in sync
    raise exc.with_traceback(tb)
        args       = ()
        callback_timeout = None
        e          = <threading.Event object at 0x7f0b52fe4f28>
        error      = [(<class 'socket.gaierror'>,
  gaierror(-5, 'No address associated with hostname'),
  <traceback object at 0x7f0b52ce0b08>)]
        exc        = gaierror(-5, 'No address associated with hostname')
        f          = <function sync.<locals>.f at 0x7f0b53804158>
        func       = <function test_defaults.<locals>.f at 0x7f0b538a8d08>
        kwargs     = {}
        loop       = <tornado.platform.asyncio.AsyncIOLoop object at 0x7f0b5236bda0>
        main_tid   = 139686948337472
        result     = [None]
        tb         = <traceback object at 0x7f0b52ce0b08>
        typ        = <class 'socket.gaierror'>
distributed/utils.py:323: in f
    result[0] = yield future
        args       = ()
        callback_timeout = None
        e          = <threading.Event object at 0x7f0b52fe4f28>
        error      = [(<class 'socket.gaierror'>,
  gaierror(-5, 'No address associated with hostname'),
  <traceback object at 0x7f0b52ce0b08>)]
        func       = <function test_defaults.<locals>.f at 0x7f0b538a8d08>
        future     = <coroutine object f at 0x7f0b537e88e0>
        kwargs     = {}
        main_tid   = 139686948337472
        result     = [None]
/usr/lib64/python3.6/site-packages/tornado/gen.py:735: in run
    value = future.result()
        exc_info   = None
        future     = None
        self       = <tornado.gen.Runner object at 0x7f0b5319e6d8>
distributed/cli/tests/test_dask_scheduler.py:33: in f
    await assert_can_connect_from_everywhere_4_6(8786, timeout=5.0)
distributed/utils_test.py:1131: in assert_can_connect_from_everywhere_4_6
    assert_can_connect("%s://[%s]:%d" % (protocol, get_ipv6(), port), **kwargs),
        futures    = [<coroutine object assert_can_connect at 0x7f0b537e8fc0>,
 <coroutine object assert_can_connect at 0x7f0b537e8bf8>]
        kwargs     = {'timeout': 5.0}
        port       = 8786
        protocol   = 'tcp'
distributed/utils.py:167: in get_ipv6
    return _get_ip(host, port, family=socket.AF_INET6)
        host       = '2001:4860:4860::8888'
        port       = 80
/usr/lib64/python3.6/site-packages/toolz/functoolz.py:460: in memof
    cache[k] = result = func(*args, **kwargs)
        args       = ('2001:4860:4860::8888', 80)
        cache      = {('', 80): ''}
        func       = <function _get_ip at 0x7f0b63eddc80>
        k          = ('2001:4860:4860::8888', 80)
        key        = <function memoize.<locals>.key at 0x7f0b63eddd08>
        kwargs     = {'family': <AddressFamily.AF_INET6: 10>}
distributed/utils.py:146: in _get_ip
    socket.gethostname(), port, family, socket.SOCK_DGRAM, socket.IPPROTO_UDP
        family     = <AddressFamily.AF_INET6: 10>
        host       = '2001:4860:4860::8888'
        port       = 80
        sock       = <socket.socket [closed] fd=-1, family=AddressFamily.AF_INET6, type=SocketKind.SOCK_DGRAM, proto=0>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

host = 'localhost', port = 80, family = <AddressFamily.AF_INET6: 10>, type = <SocketKind.SOCK_DGRAM: 2>, proto = 17, flags = 0

    def getaddrinfo(host, port, family=0, type=0, proto=0, flags=0):
        """Resolve host and port into list of address info entries.

        Translate the host/port argument into a sequence of 5-tuples that contain
        all the necessary arguments for creating a socket connected to that service.
        host is a domain name, a string representation of an IPv4/v6 address or
        None. port is a string service name such as 'http', a numeric port number or
        None. By passing None as the value of host and port, you can pass NULL to
        the underlying C API.

        The family, type and proto arguments can be optionally specified in order to
        narrow the list of addresses returned. Passing zero as a value for each of
        these arguments selects the full range of results.
        # We override this function since we want to translate the numeric family
        # and socket type values to enum constants.
        addrlist = []
>       for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
E       socket.gaierror: [Errno -5] No address associated with hostname

addrlist   = []
family     = <AddressFamily.AF_INET6: 10>
flags      = 0
host       = 'localhost'
port       = 80
proto      = 17
type       = <SocketKind.SOCK_DGRAM: 2>

/usr/lib64/python3.6/socket.py:745: gaierror
-------------------------------------------------------------- Captured stdout call ---------------------------------------------------------------

Print from stderr

distributed.scheduler - INFO - -----------------------------------------------
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - -----------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:      tcp://
distributed.scheduler - INFO -   dashboard at:                     :8787
distributed.scheduler - INFO - Receive client connection: Client-410542b8-d3c2-11ea-8050-b7f84a3806c1
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Remove client Client-410542b8-d3c2-11ea-8050-b7f84a3806c1
distributed.scheduler - INFO - Remove client Client-410542b8-d3c2-11ea-8050-b7f84a3806c1
distributed.scheduler - INFO - Close client connection: Client-410542b8-d3c2-11ea-8050-b7f84a3806c1
distributed.scheduler - INFO - End scheduler at 'tcp://'
Traceback (most recent call last):
  File "/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0-python3_6/test/scripts/dask-scheduler", line 11, in <module>
    load_entry_point('distributed==2.22.0', 'console_scripts', 'dask-scheduler')()
  File "/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0-python3_6/lib/distributed/cli/dask_scheduler.py", line 226, in go
  File "/usr/lib64/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib64/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/lib64/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib64/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0-python3_6/lib/distributed/cli/dask_scheduler.py", line 217, in main
  File "/usr/lib64/python3.6/site-packages/tornado/ioloop.py", line 531, in run_sync
    raise TimeoutError("Operation timed out after %s seconds" % timeout)
tornado.util.TimeoutError: Operation timed out after None seconds

Print from stdout

================================================================ warnings summary =================================================================
  /tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/utils.py:143: RuntimeWarning: Couldn't detect a suitable IP address for reaching '2001:4860:4860::8888', defaulting to hostname: [Errno 101] Network is unreachable

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================== slowest 10 durations ===============================================================
1.97s call     distributed/cli/tests/test_dask_scheduler.py::test_defaults
0.00s teardown distributed/cli/tests/test_dask_scheduler.py::test_defaults
0.00s setup    distributed/cli/tests/test_dask_scheduler.py::test_defaults
============================================================= short test summary info =============================================================
SKIPPED [1] distributed/comm/tests/test_ucx.py:4: could not import 'ucp': No module named 'ucp'
SKIPPED [1] distributed/comm/tests/test_ucx_config.py:16: could not import 'ucp': No module named 'ucp'
SKIPPED [1] distributed/dashboard/tests/test_components.py:5: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/dashboard/tests/test_scheduler_bokeh.py:10: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/dashboard/tests/test_worker_bokeh.py:8: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/deploy/tests/test_ssh.py:3: could not import 'asyncssh': No module named 'asyncssh'
SKIPPED [1] distributed/diagnostics/tests/test_progress_stream.py:3: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/diagnostics/tests/test_widgets.py:3: could not import 'ipywidgets': No module named 'ipywidgets'
SKIPPED [1] distributed/http/scheduler/tests/test_scheduler_http.py:6: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/protocol/tests/test_arrow.py:4: could not import 'pyarrow': No module named 'pyarrow'
SKIPPED [1] distributed/protocol/tests/test_cupy.py:6: could not import 'cupy': No module named 'cupy'
SKIPPED [1] distributed/protocol/tests/test_h5py.py:6: could not import 'h5py': No module named 'h5py'
SKIPPED [1] distributed/protocol/tests/test_keras.py:5: could not import 'keras': No module named 'keras'
SKIPPED [1] distributed/protocol/tests/test_netcdf4.py:3: could not import 'netCDF4': No module named 'netCDF4'
SKIPPED [1] distributed/protocol/tests/test_numba.py:5: could not import 'numba.cuda': No module named 'numba'
SKIPPED [1] distributed/protocol/tests/test_rmm.py:5: could not import 'numba.cuda': No module named 'numba'
SKIPPED [1] distributed/protocol/tests/test_sklearn.py:3: could not import 'sklearn': No module named 'sklearn'
SKIPPED [1] distributed/protocol/tests/test_sparse.py:5: could not import 'sparse': No module named 'sparse'
SKIPPED [1] distributed/protocol/tests/test_torch.py:5: could not import 'torch': No module named 'torch'
SKIPPED [1] distributed/tests/test_gpu_metrics.py:4: could not import 'pynvml': No module named 'pynvml'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================= 1 failed, 20 skipped, 4 deselected, 1 warning in 12.25s =============================================
sys:1: RuntimeWarning: coroutine 'assert_can_connect' was never awaited

Even if I skip tests that fail immediately, a lot of tests error out during teardown:

=============================================================== test session starts ===============================================================
platform linux -- Python 3.6.11, pytest-6.0.1, py-1.9.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase('/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/.hypothesis/examples')
rootdir: /tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0, configfile: setup.cfg
plugins: mock-3.2.0, pyfakefs-4.1.0, shutil-1.7.0, virtualenv-1.7.0, timeout-1.4.1, freezegun-0.4.2, localserver-0.5.0, forked-1.3.0, xdist-1.34.0, asyncio-0.14.0, hypothesis-5.23.7
collecting ... collected 1612 items / 4 deselected / 20 skipped / 1588 selected

distributed/cli/tests/test_dask_scheduler.py::test_no_dashboard SKIPPED                                                                     [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_dashboard SKIPPED                                                                        [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_dashboard_non_standard_ports SKIPPED                                                     [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_dashboard_whitelist SKIPPED                                                              [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_interface PASSED                                                                         [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_pid_file SKIPPED                                                                         [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_scheduler_port_zero PASSED                                                               [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_dashboard_port_zero SKIPPED                                                              [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_preload_file PASSED                                                                      [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_preload_module PASSED                                                                    [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_preload_remote_module PASSED                                                             [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_preload_command PASSED                                                                   [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_preload_command_default PASSED                                                           [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_version_option PASSED                                                                    [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_idle_timeout PASSED                                                                      [  0%]
distributed/cli/tests/test_dask_scheduler.py::test_multiple_workers PASSED                                                                  [  0%]
distributed/cli/tests/test_dask_spec.py::test_text PASSED                                                                                   [  1%]
distributed/cli/tests/test_dask_spec.py::test_file PASSED                                                                                   [  1%]
distributed/cli/tests/test_dask_spec.py::test_errors PASSED                                                                                 [  1%]
distributed/cli/tests/test_dask_ssh.py::test_version_option PASSED                                                                          [  1%]
distributed/cli/tests/test_dask_worker.py::test_nanny_worker_ports PASSED                                                                   [  1%]
distributed/cli/tests/test_dask_worker.py::test_nanny_worker_port_range PASSED                                                              [  1%]
distributed/cli/tests/test_dask_worker.py::test_nanny_worker_port_range_too_many_workers_raises PASSED                                      [  1%]
distributed/cli/tests/test_dask_worker.py::test_memory_limit PASSED                                                                         [  1%]
distributed/cli/tests/test_dask_worker.py::test_no_nanny PASSED                                                                             [  1%]
distributed/cli/tests/test_dask_worker.py::test_no_reconnect[--nanny] PASSED                                                                [  1%]
distributed/cli/tests/test_dask_worker.py::test_no_reconnect[--no-nanny] PASSED                                                             [  1%]
distributed/cli/tests/test_dask_worker.py::test_resources PASSED                                                                            [  1%]
distributed/cli/tests/test_dask_worker.py::test_local_directory[--nanny] PASSED                                                             [  1%]
distributed/cli/tests/test_dask_worker.py::test_local_directory[--no-nanny] PASSED                                                          [  1%]
distributed/cli/tests/test_dask_worker.py::test_scheduler_file[--nanny] PASSED                                                              [  1%]
distributed/cli/tests/test_dask_worker.py::test_scheduler_file[--no-nanny] PASSED                                                           [  1%]
distributed/cli/tests/test_dask_worker.py::test_scheduler_address_env PASSED                                                                [  2%]
distributed/cli/tests/test_dask_worker.py::test_nprocs_requires_nanny PASSED                                                                [  2%]
distributed/cli/tests/test_dask_worker.py::test_nprocs_expands_name PASSED                                                                  [  2%]
distributed/cli/tests/test_dask_worker.py::test_contact_listen_address[tcp://] PASSED                                  [  2%]
distributed/cli/tests/test_dask_worker.py::test_contact_listen_address[tcp://] PASSED                               [  2%]
distributed/cli/tests/test_dask_worker.py::test_contact_listen_address[tcp://] PASSED                                [  2%]
distributed/cli/tests/test_dask_worker.py::test_contact_listen_address[tcp://] PASSED                             [  2%]
distributed/cli/tests/test_dask_worker.py::test_respect_host_listen_address[] PASSED                                       [  2%]
distributed/cli/tests/test_dask_worker.py::test_respect_host_listen_address[] PASSED                                    [  2%]
distributed/cli/tests/test_dask_worker.py::test_respect_host_listen_address[] PASSED                                         [  2%]
distributed/cli/tests/test_dask_worker.py::test_respect_host_listen_address[] PASSED                                      [  2%]
distributed/cli/tests/test_dask_worker.py::test_dashboard_non_standard_ports SKIPPED                                                        [  2%]
distributed/cli/tests/test_dask_worker.py::test_version_option PASSED                                                                       [  2%]
distributed/cli/tests/test_dask_worker.py::test_worker_timeout[True] PASSED                                                                 [  2%]
distributed/cli/tests/test_dask_worker.py::test_worker_timeout[False] PASSED                                                                [  2%]
distributed/cli/tests/test_dask_worker.py::test_bokeh_deprecation SKIPPED                                                                   [  2%]
distributed/cli/tests/test_dask_worker.py::test_integer_names PASSED                                                                        [  3%]
distributed/cli/tests/test_dask_worker.py::test_integer_names ERROR                                                                         [  3%]

===================================================================== ERRORS ======================================================================
_____________________________________________________ ERROR at teardown of test_integer_names _____________________________________________________

    def cleanup():
        with clean():
>           yield

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/usr/lib64/python3.6/contextlib.py:88: in __exit__
        self       = <contextlib._GeneratorContextManager object at 0x7fa3df6b70f0>
        traceback  = None
        type       = None
        value      = None
distributed/utils_test.py:1535: in clean
    del thread_state.on_event_loop_thread
        instances  = True
        level      = 0
        loop       = <tornado.platform.asyncio.AsyncIOLoop object at 0x7fa3df778710>
        name       = 'distributed.utils_test'
        null       = <function clean.<locals>.null at 0x7fa3df7e9bf8>
        processes  = True
        threads    = True
        timeout    = 1
/usr/lib64/python3.6/contextlib.py:88: in __exit__
        self       = <contextlib._GeneratorContextManager object at 0x7fa3df6b7860>
        traceback  = None
        type       = None
        value      = None
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    def check_thread_leak():
        active_threads_start = set(threading._active)


        start = time()
        while True:
            bad = [
                for t, v in threading._active.items()
                if t not in active_threads_start
                and "Threaded" not in v.name
                and "watch message" not in v.name
                and "TCP-Executor" not in v.name
            if not bad:
            if time() > start + 5:
                from distributed import profile

                tid = bad[0]
                thread = threading._active[tid]
                call_stacks = profile.call_stack(sys._current_frames()[tid])
>               assert False, (thread, call_stacks)
E               AssertionError: (<Thread(Profile, started daemon 140341756917504)>, ['  File "/usr/lib64/python3.6/threading.py", line 884, in _bootst...-python/distributed-2.22.0/work/distributed-2.22.0/distributed/profile.py", line 269, in _watch
E                 \tsleep(interval)
E                 '])
E               assert False

active_threads_start = {140341765310208,
bad        = [140341756917504]
call_stacks = ['  File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap\n'
 '  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner\n'
 '  File "/usr/lib64/python3.6/threading.py", line 864, in run\n'
 '\tself._target(*self._args, **self._kwargs)\n',
 '  File '
 '"/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/profile.py", '
 'line 269, in _watch\n'
profile    = <module 'distributed.profile' from '/tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/profile.py'>
start      = 1596264403.1102982
thread     = <Thread(Profile, stopped daemon 140341756917504)>
tid        = 140341756917504

distributed/utils_test.py:1435: AssertionError
-------------------------------------------------------------- Captured stderr call ---------------------------------------------------------------
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at:     tcp://
distributed.scheduler - INFO -   dashboard at:                     :8787
distributed.scheduler - INFO - Register worker <Worker 'tcp://', name: 123, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://
distributed.core - INFO - Starting established connection
distributed.scheduler - INFO - Scheduler closing...
distributed.scheduler - INFO - Scheduler closing all comms
distributed.scheduler - INFO - Remove worker <Worker 'tcp://', name: 123, memory: 0, processing: 0>
distributed.core - INFO - Removing comms to tcp://
distributed.scheduler - INFO - Lost all workers
================================================================ warnings summary =================================================================
  /tmp/portage/dev-python/distributed-2.22.0/work/distributed-2.22.0/distributed/utils.py:143: RuntimeWarning: Couldn't detect a suitable IP address for reaching '', defaulting to hostname: [Errno 101] Network is unreachable

-- Docs: https://docs.pytest.org/en/stable/warnings.html
============================================================== slowest 10 durations ===============================================================
5.32s call     distributed/cli/tests/test_dask_worker.py::test_no_reconnect[--nanny]
5.00s teardown distributed/cli/tests/test_dask_worker.py::test_integer_names
4.56s call     distributed/cli/tests/test_dask_worker.py::test_nprocs_expands_name
3.93s call     distributed/cli/tests/test_dask_worker.py::test_nanny_worker_port_range
3.85s call     distributed/cli/tests/test_dask_worker.py::test_respect_host_listen_address[]
3.85s call     distributed/cli/tests/test_dask_worker.py::test_no_reconnect[--no-nanny]
3.85s call     distributed/cli/tests/test_dask_worker.py::test_contact_listen_address[tcp://]
3.80s call     distributed/cli/tests/test_dask_worker.py::test_respect_host_listen_address[]
3.79s call     distributed/cli/tests/test_dask_worker.py::test_scheduler_file[--nanny]
3.77s call     distributed/cli/tests/test_dask_worker.py::test_memory_limit
============================================================= short test summary info =============================================================
SKIPPED [1] distributed/comm/tests/test_ucx.py:4: could not import 'ucp': No module named 'ucp'
SKIPPED [1] distributed/comm/tests/test_ucx_config.py:16: could not import 'ucp': No module named 'ucp'
SKIPPED [1] distributed/dashboard/tests/test_components.py:5: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/dashboard/tests/test_scheduler_bokeh.py:10: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/dashboard/tests/test_worker_bokeh.py:8: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/deploy/tests/test_ssh.py:3: could not import 'asyncssh': No module named 'asyncssh'
SKIPPED [1] distributed/diagnostics/tests/test_progress_stream.py:3: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/diagnostics/tests/test_widgets.py:3: could not import 'ipywidgets': No module named 'ipywidgets'
SKIPPED [1] distributed/http/scheduler/tests/test_scheduler_http.py:6: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/protocol/tests/test_arrow.py:4: could not import 'pyarrow': No module named 'pyarrow'
SKIPPED [1] distributed/protocol/tests/test_cupy.py:6: could not import 'cupy': No module named 'cupy'
SKIPPED [1] distributed/protocol/tests/test_h5py.py:6: could not import 'h5py': No module named 'h5py'
SKIPPED [1] distributed/protocol/tests/test_keras.py:5: could not import 'keras': No module named 'keras'
SKIPPED [1] distributed/protocol/tests/test_netcdf4.py:3: could not import 'netCDF4': No module named 'netCDF4'
SKIPPED [1] distributed/protocol/tests/test_numba.py:5: could not import 'numba.cuda': No module named 'numba'
SKIPPED [1] distributed/protocol/tests/test_rmm.py:5: could not import 'numba.cuda': No module named 'numba'
SKIPPED [1] distributed/protocol/tests/test_sklearn.py:3: could not import 'sklearn': No module named 'sklearn'
SKIPPED [1] distributed/protocol/tests/test_sparse.py:5: could not import 'sparse': No module named 'sparse'
SKIPPED [1] distributed/protocol/tests/test_torch.py:5: could not import 'torch': No module named 'torch'
SKIPPED [1] distributed/tests/test_gpu_metrics.py:4: could not import 'pynvml': No module named 'pynvml'
SKIPPED [1] distributed/cli/tests/test_dask_scheduler.py:58: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/cli/tests/test_dask_scheduler.py:66: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/cli/tests/test_dask_scheduler.py:100: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/cli/tests/test_dask_scheduler.py:125: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/cli/tests/test_dask_scheduler.py:188: Intermittent failure on old Python version
SKIPPED [1] distributed/cli/tests/test_dask_scheduler.py:235: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/cli/tests/test_dask_worker.py:322: could not import 'bokeh': No module named 'bokeh'
SKIPPED [1] distributed/cli/tests/test_dask_worker.py:377: could not import 'bokeh': No module named 'bokeh'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
================================== 41 passed, 28 skipped, 4 deselected, 1 warning, 1 error in 133.05s (0:02:13) ===================================


mrocklin commented 4 years ago

@jacobtomlinson I'm not sure what your priorities are like these days. But would this be easy for you to resolve?

jacobtomlinson commented 4 years ago

Looking through the log some of these issues appear to be related to IPv6 rather than a lack of internet connection.

@mgorny could you share a little more about your network setup, what interfaces you have and what IP addresses any active interfaces have when you ran these tests?

quasiben commented 4 years ago

@mrocklin fixed some issues with importing distributed without a network here: https://github.com/dask/distributed/pull/3991. Perhaps a similar solution of try/execpt with defaults to would also work here

mgorny commented 4 years ago

@mgorny could you share a little more about your network setup, what interfaces you have and what IP addresses any active interfaces have when you ran these tests?

I'm running tests inside network namespace, with only lo interface set up, i.e. roughly:

$ sudo unshare -n bash
# ifconfig lo up
# ifconfig lo
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet  netmask
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

The hostname is also reset to localhost.

detrout commented 3 years ago


I was trying to get Distributed 202101 to run on Debian's testing infrastructure and I had this or a closely related problem. Our test runners have a working ipv6 loop back but don't have a routable ipv6 address or a ipv6 address for the default hostname.

has_ipv6 checks that ipv6 is enabled on the loopback interface, but then in test_comms.py there's this:

EXTERNAL_IP4 = get_ip()
if has_ipv6():
    with warnings.catch_warnings(record=True):
        EXTERNAL_IP6 = get_ipv6()

get_ipv6() tries to open a dgram socket to "2001:4860:4860::8888" but it seems like that fails because there's no routing table connecting available to connect to 2001::

On the Debian test systems gethostname() returns something that doesn't have an ipv6 address attached to it so this block also fails with socket.gaierror: [Errno -5] No address associated with hostname

        addr_info = socket.getaddrinfo(
            socket.gethostname(), port, family, socket.SOCK_DGRAM, socket.IPPROTO_UDP

My first temptation is to put a call to get_ipv6() into has_ipv6() so it checks to see it has a routable ipv6 address.

Though I'm wondering if has_ipv6() should be split into a loop back only check and a separate check for a routable address.

My other idea would be to extend the fail over code in _get_ip to try gethostname(), and then if that doesn't work fail over to using the hostname "localhost" or "ip6-localhost".

Do you have any thoughts which would be better?