dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 718 forks source link

Local test failures on MacOS #3356

Open crusaderky opened 4 years ago

crusaderky commented 4 years ago

I can't seem to create a working environment on my MacOS Catalina dev box. CONTRIBUTING.md could use some attention...

export PYTHON=3.7 TESTS=true PACKAGES="scikit-learn python-snappy python-blosc" TORNADO=6
source continuous_integration/travis/install.sh
source continuous_integration/travis/setup-ssh.sh
source continuous_integration/travis/run_tests.sh

Test failures:

1.

Many (all?) tests failed on the check_thread_leak() helper function, e.g. test_scheduler_bokeh.test_simple:

>               assert False, (thread, call_stacks)
E               AssertionError: (<Thread(Profile, started daemon 123145338511360)>, ['  File "/Users/crusaderky/miniconda3/envs/distributed/lib/python...ile "/Users/crusaderky/PycharmProjects/distributed/distributed/profile.py", line 269, in _watch
E                 \tsleep(interval)
E                 '])
E               assert False

2.

failed with: OSError: Timed out trying to connect to 'tcp://[Guidos-MacBook-Pro.local]:8786' after 5 s: Timed out trying to connect to 'tcp://[Guidos-MacBook-Pro.local]:8786' after 5 s: connect() didn't finish in time

3.

failed with:

                with pytest.raises(Exception):
>                   requests.get("http://127.0.0.1:8787/status/")
E                   Failed: DID NOT RAISE <class 'Exception'>

What am I missing or doing wrong? Have the tests been run recently on MacOSX?

mrocklin commented 4 years ago

Interesting. I no longer have access to an OSX machine. If you or someone else is able to diagnose and resolve these issues that would be great.

On Wed, Jan 8, 2020 at 8:18 AM crusaderky notifications@github.com wrote:

I can't seem to create a working environment on my MacOS Catalina dev box. CONTRIBUTING.md could use some attention...

export PYTHON=3.7 TESTS=true PACKAGES="scikit-learn python-snappy python-blosc" TORNADO=6 source continuous_integration/travis/install.sh source continuous_integration/travis/setup-ssh.sh source continuous_integration/travis/run_tests.sh

Test failures: 1.

Many (all?) tests failed on the check_thread_leak() helper function, e.g. test_scheduler_bokeh.test_simple:

          assert False, (thread, call_stacks)

E AssertionError: (<Thread(Profile, started daemon 123145338511360)>, [' File "/Users/crusaderky/miniconda3/envs/distributed/lib/python...ile "/Users/crusaderky/PycharmProjects/distributed/distributed/profile.py", line 269, in _watch E \tsleep(interval) E ']) E assert False

2.

  • test_dask_scheduler.test_defaults
  • test_dask_scheduler.test_hostport

failed with: OSError: Timed out trying to connect to 'tcp://[Guidos-MacBook-Pro.local]:8786' after 5 s: Timed out trying to connect to 'tcp://[Guidos-MacBook-Pro.local]:8786' after 5 s: connect() didn't finish in time 3.

  • test_dask_scheduler.test_no_dashboard
  • test_dask_scheduler.test_dashboard

failed with:

            with pytest.raises(Exception):
              requests.get("http://127.0.0.1:8787/status/")

E Failed: DID NOT RAISE <class 'Exception'>

4.

  • test_comms.test_default_client_server_ipv6
  • test_comms.test_tcp_client_server_ipv4
  • test_comms.test_tcp_client_server_ipv6
  • test_comms.test_tls_client_server_ipv4

failed with: OSError: Timed out trying to connect to 'tls://127.0.0.1:61915' after 5 s: Timed out trying to connect to 'tls://127.0.0.1:61915' after 5 s: connect() didn't finish in time

What am I missing or doing wrong? Have the tests been run recently on MacOSX?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/distributed/issues/3356?email_source=notifications&email_token=AACKZTG3G3ULB7PQPC3POY3Q4X4EPA5CNFSM4KEK3AI2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4IEZ7LPA, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTCP233TIESZNNQXABLQ4X4EPANCNFSM4KEK3AIQ .

quasiben commented 4 years ago

@crusaderky I think that if you set the hostname to 127.0.0.1 and ::1 in the /etc/hosts file this should resolve the issue for you most of the issues. I am still looking at why test_dask_scheduler.test_hostport and the tests in 3 fail

# /etc/hosts
127.0.0.1   Guidos-MacBook-Pro.local
::1             Guidos-MacBook-Pro.local
127.0.0.1   localhost
255.255.255.255 broadcasthost
::1             localhost
quasiben commented 4 years ago

actually, if you set the hostname to your local IP (something 19.2.168.1.XX)

192.168.1.5 Guidos-MacBook-Pro.local

everything should pass

jakirkham commented 4 years ago

So I tried to reproduce this. Actually am getting a lot more failures than the OP listed. Wasn't able to get through the full test suite as it hung ~40% of its way through. Not entirely sure why this isn't working.

crusaderky commented 4 years ago

I believe that Azure Pipelines (and probably more CIs) offer MacOSX environments. I think it should be a wise idea to set it up (at least on master, if not on the PRs).

jrbourbeau commented 4 years ago

I've added an OSX build over in https://github.com/dask/distributed/pull/3358 that will skip PRs and can be manually added to any commit by including "test-osx" in the commit message. Also marked as an "allowed failure" for the time being.

dankerrigan commented 4 years ago

Regarding tests hanging at 40%, I think that might be because the default ulimit for macOS is quite low at 256. Run ulimit -n to check. Running ulimit -n 65536 before running py.test --verbose distributed should result in more of the test suite running. That said, I'm still looking into py.test distributed/deploy/tests/test_old_ssh.py and py.test distributed/deploy/tests/test_ssh.py::test_defer_to_old aborting without any message.