dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

0.7.0 Still breaks with Distributed 2.2.0 #92

Closed bschreck closed 5 years ago

bschreck commented 5 years ago

Some issues with the nanny. Workers start up and then quickly fail.

cluster = YarnCluster(n_workers=1, deploy_mode='local')
client = Client(cluster)

Logs:

Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/bin/dask-yarn", line 11, in <module>
    sys.exit(main())
  File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/dask_yarn/cli.py", line 412, in main
    func(**kwargs)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/dask_yarn/cli.py", line 379, in worker
    loop.run_sync(run)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/tornado/ioloop.py", line 532, in run_sync
    return future_cell[0].result()
  File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
    yielded = next(result)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/dask_yarn/cli.py", line 374, in run
    yield worker._start(None)
AttributeError: 'Nanny' object has no attribute '_start'
mrocklin commented 5 years ago

That code doesn't seem to be in the latest release. I suspect that you still have an older version lying around.

On Fri, Aug 2, 2019 at 3:16 PM Ben Schreck notifications@github.com wrote:

Some issues with the nanny. Workers start up and then quickly fail.

cluster = YarnCluster(n_workers=1, deploy_mode='local') client = Client(cluster)

Logs:

Traceback (most recent call last): File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/bin/dask-yarn", line 11, in sys.exit(main()) File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/dask_yarn/cli.py", line 412, in main func(**kwargs) File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/dask_yarn/cli.py", line 379, in worker loop.run_sync(run) File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/tornado/ioloop.py", line 532, in run_sync return future_cell[0].result() File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper yielded = next(result) File "/mnt/yarn/usercache/hadoop/appcache/application_1564782225613_0002/container_1564782225613_0002_01_000002/environment/lib/python3.6/site-packages/dask_yarn/cli.py", line 374, in run yield worker._start(None) AttributeError: 'Nanny' object has no attribute '_start'

-

Version information

Please include version information for the following:

  • Python version (python --version) 3.6.9
    • Dask-Yarn version (dask-yarn --version) 0.7.0
    • Hadoop version, and distribution (e.g. CDH) if applicable: EMR

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-yarn/issues/92?email_source=notifications&email_token=AACKZTFS5ES2H2QXPK72ZHDQCSW5LA5CNFSM4IJBMGA2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HDE44AA, or mute the thread https://github.com/notifications/unsubscribe-auth/AACKZTB22QQNMCSFD423ZS3QCSW5LANCNFSM4IJBMGAQ .

bschreck commented 5 years ago

hmm I guess I never explicitly checked what version of distributed I had running, just assumed it was 2.2.0 because I had dask-yarn 0.7.0, which depends on distributed>=2.2.0.

When I explicitly set distributed==2.1.0 it worked fine

jcrist commented 5 years ago

just assumed it was 2.2.0 because I had dask-yarn 0.7.0, which depends on distributed>=2.2.0.

You definitely don't have dask-yarn 0.7.0, because the code in your traceback is from an older version, and no longer exists in the lastest release. For example, your traceback points at line 374 in cli.py, which in the latest release is https://github.com/dask/dask-yarn/blob/a6aca36284e6ae78d951fc12536678a8cb784334/dask_yarn/cli.py#L374.

I suspect you have an older version of dask-yarn around, perhaps in a packaged environment.

bschreck commented 5 years ago

You must have been right. However, my new error after creating a fresh cluster is all websocket related. I replied in the more relevant thread: https://github.com/dask/dask-yarn/issues/80#issuecomment-518452242

jcrist commented 5 years ago

Glad to hear it, closing.