dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

KeyError: 'ncores' #77

Closed AlJohri closed 5 years ago

AlJohri commented 5 years ago

I was unable to copy and paste the stacktrace but here's a screenshot:

Screen Shot 2019-07-02 at 8 36 40 PM
TomAugspurger commented 5 years ago

I believe this was fixed in dask-yarn 0.6.1, which is compatible with distributed 2.0 and newer.

On Tue, Jul 2, 2019 at 7:37 PM Al Johri notifications@github.com wrote:

I was unable to copy and paste the stacktrace but here's a screenshot:

[image: Screen Shot 2019-07-02 at 8 36 40 PM] https://user-images.githubusercontent.com/2790092/60555136-2d318f80-9d09-11e9-8c26-f3bc2bb21a83.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask-yarn/issues/77?email_source=notifications&email_token=AAKAOISCR74P2V7VOLT55EDP5PYFPA5CNFSM4H5A3XF2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G5AKN7Q, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIXP7E5SAXOCLT67SH3P5PYFPANCNFSM4H5A3XFQ .

AlJohri commented 5 years ago

@TomAugspurger That's the version I'm on- same issue.

from dask_yarn import YarnCluster
from dask.distributed import Client

# Create a cluster in local deploy mode, to have access to the dashboard
cluster = YarnCluster(deploy_mode='local', worker_vcores=1, worker_memory="256MiB")

# Connect to the cluster
client = Client(cluster)
client.get_versions()
{'scheduler': {'host': (('python', '3.7.3.final.0'),
   ('python-bits', 64),
   ('OS', 'Linux'),
   ('OS-release', '4.14.104-78.84.amzn1.x86_64'),
   ('machine', 'x86_64'),
   ('processor', 'x86_64'),
   ('byteorder', 'little'),
   ('LC_ALL', 'None'),
   ('LANG', 'en_US.UTF-8'),
   ('LOCALE', 'en_US.UTF-8')),
  'packages': {'required': (('dask', '2.0.0'),
    ('distributed', '2.0.1'),
    ('msgpack', '0.6.1'),
    ('cloudpickle', '1.2.1'),
    ('tornado', '6.0.3'),
    ('toolz', '0.9.0')),
   'optional': (('numpy', '1.16.4'),
    ('pandas', '0.24.2'),
    ('bokeh', '1.2.0'),
    ('lz4', '2.1.10'),
    ('dask_ml', '1.0.0'),
    ('blosc', None))}},
 'workers': {},
 'client': {'host': [('python', '3.7.3.final.0'),
   ('python-bits', 64),
   ('OS', 'Linux'),
   ('OS-release', '4.14.104-78.84.amzn1.x86_64'),
   ('machine', 'x86_64'),
   ('processor', 'x86_64'),
   ('byteorder', 'little'),
   ('LC_ALL', 'None'),
   ('LANG', 'en_US.UTF-8'),
   ('LOCALE', 'en_US.UTF-8')],
  'packages': {'required': [('dask', '2.0.0'),
    ('distributed', '2.0.1'),
    ('msgpack', '0.6.1'),
    ('cloudpickle', '1.2.1'),
    ('tornado', '6.0.3'),
    ('toolz', '0.9.0')],
   'optional': [('numpy', '1.16.4'),
    ('pandas', '0.24.2'),
    ('bokeh', '1.2.0'),
    ('lz4', '2.1.10'),
    ('dask_ml', '1.0.0'),
    ('blosc', None)]}}}
cluster.scale(1)
tornado.application - ERROR - Exception in callback <function YarnCluster._widget.<locals>.update at 0x7f6fc8c92950>
Traceback (most recent call last):
  File "/opt/python3/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run
    return self.callback()
  File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 648, in update
    status.value = self._widget_status()
  File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in _widget_status
    cores = sum(w['ncores'] for w in workers.values())
  File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in <genexpr>
    cores = sum(w['ncores'] for w in workers.values())
KeyError: 'ncores'
tornado.application - ERROR - Exception in callback <function YarnCluster._widget.<locals>.update at 0x7f6fc8c92950>
Traceback (most recent call last):
  File "/opt/python3/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run
    return self.callback()
  File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 648, in update
    status.value = self._widget_status()
  File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in _widget_status
    cores = sum(w['ncores'] for w in workers.values())
  File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in <genexpr>
    cores = sum(w['ncores'] for w in workers.values())
KeyError: 'ncores'

And if I'm quick enough:

client.scheduler_info()['workers']
{'tcp://10.73.56.214:35797': {'type': 'Worker',
  'id': 'tcp://10.73.56.214:35797',
  'host': '10.73.56.214',
  'resources': {},
  'local_directory': '/mnt/yarn/usercache/hadoop/appcache/application_1562112156023_0010/container_1562112156023_0010_01_000004/dask-worker-space/worker-8jdl0grw',
  'name': 'tcp://10.73.56.214:35797',
  'nthreads': 1,
  'memory_limit': 268435456,
  'last_seen': 1562122681.500993,
  'services': {},
  'metrics': {'cpu': 2.0,
   'memory': 40427520,
   'time': 1562122680.9977098,
   'read_bytes': 5693.9747349663585,
   'write_bytes': 6715.667924125264,
   'num_fds': 22,
   'executing': 0,
   'in_memory': 0,
   'ready': 0,
   'in_flight': 0,
   'bandwidth': 100000000},
  'nanny': 'tcp://10.73.56.214:39437'}}
cat /opt/python3/lib/python3.7/site-packages/dask_yarn/_version.py  | grep '"version"'
 "version": "0.6.1"
cat /opt/python3/lib/python3.7/site-packages/dask_yarn/core.py | grep ncores
        cores = sum(w['ncores'] for w in workers.values())
TomAugspurger commented 5 years ago

Thanks for checking. I must have been thinking of a different library.

On Tue, Jul 2, 2019 at 10:00 PM Al Johri notifications@github.com wrote:

@TomAugspurger https://github.com/TomAugspurger That's the version I'm on- same issue.

from dask_yarn import YarnCluster from dask.distributed import Client

Create a cluster in local deploy mode, to have access to the dashboard

cluster = YarnCluster(deploy_mode='local', worker_vcores=1, worker_memory="256MiB")

Connect to the cluster

client = Client(cluster)

client.get_versions()

{'scheduler': {'host': (('python', '3.7.3.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.14.104-78.84.amzn1.x86_64'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'None'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')), 'packages': {'required': (('dask', '2.0.0'), ('distributed', '2.0.1'), ('msgpack', '0.6.1'), ('cloudpickle', '1.2.1'), ('tornado', '6.0.3'), ('toolz', '0.9.0')), 'optional': (('numpy', '1.16.4'), ('pandas', '0.24.2'), ('bokeh', '1.2.0'), ('lz4', '2.1.10'), ('dask_ml', '1.0.0'), ('blosc', None))}}, 'workers': {}, 'client': {'host': [('python', '3.7.3.final.0'), ('python-bits', 64), ('OS', 'Linux'), ('OS-release', '4.14.104-78.84.amzn1.x86_64'), ('machine', 'x86_64'), ('processor', 'x86_64'), ('byteorder', 'little'), ('LC_ALL', 'None'), ('LANG', 'en_US.UTF-8'), ('LOCALE', 'en_US.UTF-8')], 'packages': {'required': [('dask', '2.0.0'), ('distributed', '2.0.1'), ('msgpack', '0.6.1'), ('cloudpickle', '1.2.1'), ('tornado', '6.0.3'), ('toolz', '0.9.0')], 'optional': [('numpy', '1.16.4'), ('pandas', '0.24.2'), ('bokeh', '1.2.0'), ('lz4', '2.1.10'), ('dask_ml', '1.0.0'), ('blosc', None)]}}}

cluster.scale(1)

tornado.application - ERROR - Exception in callback <function YarnCluster._widget..update at 0x7f6fc8c92950> Traceback (most recent call last): File "/opt/python3/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run return self.callback() File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 648, in update status.value = self._widget_status() File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in _widget_status cores = sum(w['ncores'] for w in workers.values()) File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in cores = sum(w['ncores'] for w in workers.values()) KeyError: 'ncores' tornado.application - ERROR - Exception in callback <function YarnCluster._widget..update at 0x7f6fc8c92950> Traceback (most recent call last): File "/opt/python3/lib/python3.7/site-packages/tornado/ioloop.py", line 907, in _run return self.callback() File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 648, in update status.value = self._widget_status() File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in _widget_status cores = sum(w['ncores'] for w in workers.values()) File "/opt/python3/lib/python3.7/site-packages/dask_yarn/core.py", line 585, in cores = sum(w['ncores'] for w in workers.values()) KeyError: 'ncores'

And if I'm quick enough:

client.scheduler_info()['workers']

{'tcp://10.73.56.214:35797': {'type': 'Worker', 'id': 'tcp://10.73.56.214:35797', 'host': '10.73.56.214', 'resources': {}, 'local_directory': '/mnt/yarn/usercache/hadoop/appcache/application_1562112156023_0010/container_1562112156023_0010_01_000004/dask-worker-space/worker-8jdl0grw', 'name': 'tcp://10.73.56.214:35797', 'nthreads': 1, 'memory_limit': 268435456, 'last_seen': 1562122681.500993, 'services': {}, 'metrics': {'cpu': 2.0, 'memory': 40427520, 'time': 1562122680.9977098, 'read_bytes': 5693.9747349663585, 'write_bytes': 6715.667924125264, 'num_fds': 22, 'executing': 0, 'in_memory': 0, 'ready': 0, 'in_flight': 0, 'bandwidth': 100000000}, 'nanny': 'tcp://10.73.56.214:39437'}}

cat /opt/python3/lib/python3.7/site-packages/dask_yarn/_version.py | grep '"version"'

"version": "0.6.1"

cat /opt/python3/lib/python3.7/site-packages/dask_yarn/core.py | grep ncores

    cores = sum(w['ncores'] for w in workers.values())

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask-yarn/issues/77?email_source=notifications&email_token=AAKAOIVMGMFRWWRE5SNER6LP5QI6VA5CNFSM4H5A3XF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZDDSWY#issuecomment-507918683, or mute the thread https://github.com/notifications/unsubscribe-auth/AAKAOIRNOJFRBCGUBXAJDDLP5QI6VANCNFSM4H5A3XFQ .