coiled / dask-mongo

BSD 3-Clause "New" or "Revised" License
19 stars 9 forks source link

Coiled runtime that includes dask-mongo? #24

Closed tonycombocurve closed 2 years ago

tonycombocurve commented 2 years ago

I am seeing the following exception when using the dask-mongo package in my code. I can use the pymongo client directly to query the collection so I know my arguments for read_mongo are correct. When I call take(1) on the bag returned I get the following exception:

Traceback (most recent call last):
  File "C:\Users\antho\repos\py-compute-poc\py-compute-poc\main.py", line 48, in <module>
    do_work()
  File "C:\Users\antho\repos\py-compute-poc\py-compute-poc\main.py", line 37, in do_work
    b.take(1)
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\dask\bag\core.py", line 1456, in take       
    return tuple(b.compute())
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\dask\base.py", line 315, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\dask\base.py", line 600, in compute
    results = schedule(dsk, keys, **kwargs)
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\distributed\client.py", line 3036, in get   
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\distributed\client.py", line 2210, in gather    return self.sync(
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\distributed\utils.py", line 338, in sync    
    return sync(
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\distributed\utils.py", line 405, in sync    
    raise exc.with_traceback(tb)
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\distributed\utils.py", line 378, in f       
    result = yield future
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\tornado\gen.py", line 762, in run
    value = future.result()
  File "C:\Users\antho\repos\py-compute-poc\venv\lib\site-packages\distributed\client.py", line 2073, in _gather
    raise exception.with_traceback(traceback)
distributed.scheduler.KilledWorker: ("('read_mongo-take-dee43a0b2ec2cd6605c74cebf61e8dcc', 0)", <WorkerState 'tls://10.0.15.194:38225', name: ghg-production-worker-de2008fcdc, status: closed, memory: 0, processing: 1>) 

I am wondering if my workers need a software environment that includes dask-mongo. Any ideas as to what might be happening?

Thanks in advance for your help!

-Tony

tonycombocurve commented 2 years ago

I created a software environment that includes dask-mongo and that eliminated the exception I listed above.

phobson commented 2 years ago

Hey @tonycombocurve -- glad you got it figured out. As you saw, your intuition was correct and your workers (via package_sync or a software environment, need to have the same libraries installed as your local/client environment.

Let us know if you have any other questions.