dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Remove 'check_new_threads' in gen_cluster() call #50

Closed ksangeek closed 4 years ago

ksangeek commented 5 years ago

Fixes https://github.com/dask/dask-xgboost/issues/49

ksangeek commented 5 years ago

Add fix for https://github.com/dask/dask-xgboost/issues/51 too! Both these changes put together now all the test pass in my environment -

$ pytest test_core.py
========================================= test session starts =========================================
platform linux -- Python 3.6.9, pytest-5.0.1, py-1.8.0, pluggy-0.12.0
rootdir: /home/sangeek/examples/dask-xgb-examples/tests
plugins: xdist-1.28.0, forked-1.0.2, cov-2.7.1
collected 12 items

test_core.py ............                                                                       [100%]

========================================== warnings summary ===========================================
test_core.py::test_synchronous_api
test_core.py::test_basic
test_core.py::test_basic
test_core.py::test_basic
  /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/xgboost/core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
    if getattr(data, 'base', None) is not None and \

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=============================== 12 passed, 4 warnings in 162.51 seconds ===============================
Task was destroyed but it is pending!
task: <Task pending coro=<Worker.execute() done, defined at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/distributed/worker.py:2373> wait_for=<Future pending cb=[coroutine.<locals>.wrapper.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/gen.py:226, <TaskWakeupMethWrapper object at 0x7ffd4da3abe8>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/ioloop.py:690]>
Task was destroyed but it is pending!
task: <Task pending coro=<Worker.execute() done, defined at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/distributed/worker.py:2373> wait_for=<Future pending cb=[coroutine.<locals>.wrapper.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/gen.py:226, <TaskWakeupMethWrapper object at 0x7ffd4da3a678>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/ioloop.py:690]>
Task was destroyed but it is pending!
task: <Task pending coro=<Worker.execute() running at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/distributed/worker.py:2418> wait_for=<Future pending cb=[coroutine.<locals>.wrapper.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/gen.py:226, <TaskWakeupMethWrapper object at 0x7ffd8956a168>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/ioloop.py:690]>
Task was destroyed but it is pending!
task: <Task pending coro=<Worker.execute() running at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/distributed/worker.py:2418> wait_for=<Future pending cb=[coroutine.<locals>.wrapper.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/gen.py:226, <TaskWakeupMethWrapper object at 0x7ffd88164138>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at /opt/anaconda3/envs/test-xgb-90-cpu/lib/python3.6/site-packages/tornado/ioloop.py:690]>
TomAugspurger commented 4 years ago

Sorry about the delay here. Pushed a commit to get some more debug info on the CI.

TomAugspurger commented 4 years ago

Seems to be passing. But the tests are taking much longer than I recall.

$ pytest --durations=1 dask_xgboost/tests/test_core.py::test_basic 
============================================================================= slowest 1 test durations =============================================================================
4.32s call     dask_xgboost/tests/test_core.py::test_basic

Will look a bit more before merging. Probably unrelated to this PR (which is very welcome @ksangeek!)

ksangeek commented 4 years ago

Thank you @TomAugspurger!

TomAugspurger commented 4 years ago

I'm going to merge this.

We're about 40x slower on a tiny dataset than just using xgboost (non-distributed) directly. I haven't profiled things yet.