NCAR / cesm-lens-aws

Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask
https://doi.org/10.26024/wt24-5j82
BSD 3-Clause "New" or "Revised" License
43 stars 23 forks source link

Use pangeo-notebook meta package #41

Closed andersy005 closed 4 years ago

andersy005 commented 4 years ago

First stab at using Pangeo-notebook conda meta package.

andersy005 commented 4 years ago

@jhamman & @scottyhq

I am trying to use the pangeo-notebook meta package approach here, however, the notebook kernel appears to be inaccessible when the notebook is launched:

jovyan@jupyter-andersy005-2dcesm-2dlens-2daws-2dhnb0jm7i:~$ conda env list# conda environments:
#
base                     /srv/conda
notebook              *  /srv/conda/envs/notebook
Screen Shot 2020-03-26 at 7 33 26 AM

Am I missing something?

andersy005 commented 4 years ago

Never mind. I just found out that the issue was due missing dependencies on my end.

andersy005 commented 4 years ago

I am now running into a different issue. KubeCluster() instantiation with

from dask_kubernetes import KubeCluster
cluster = KubeCluster()
cluster.adapt(minimum=2, maximum=100, wait_count=60)

used to work. However, with the meta package approach, I am getting the following error:

```python --------------------------------------------------------------------------- ValueError Traceback (most recent call last) in 1 # Create cluster 2 from dask_kubernetes import KubeCluster ----> 3 cluster = KubeCluster() 4 cluster.adapt(minimum=2, maximum=100, wait_count=60) /srv/conda/envs/notebook/lib/python3.7/site-packages/dask_kubernetes/core.py in __init__(self, pod_template, name, namespace, n_workers, host, port, env, auth, idle_timeout, deploy_mode, interface, protocol, dashboard_address, security, scheduler_service_wait_timeout, scheduler_pod_template, **kwargs) 414 self.auth = auth 415 self.kwargs = kwargs --> 416 super().__init__(**self.kwargs) 417 418 def _get_pod_template(self, pod_template, pod_type): /srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/deploy/spec.py in __init__(self, workers, scheduler, worker, asynchronous, loop, security, silence_logs, name) 254 if not self.asynchronous: 255 self._loop_runner.start() --> 256 self.sync(self._start) 257 self.sync(self._correct_state) 258 /srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/deploy/cluster.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs) 159 return future 160 else: --> 161 return sync(self.loop, func, *args, **kwargs) 162 163 async def _get_logs(self, scheduler=True, workers=True): /srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs) 346 if error[0]: 347 typ, exc, tb = error[0] --> 348 raise exc.with_traceback(tb) 349 else: 350 return result[0] /srv/conda/envs/notebook/lib/python3.7/site-packages/distributed/utils.py in f() 330 if callback_timeout is not None: 331 future = asyncio.wait_for(future, callback_timeout) --> 332 result[0] = yield future 333 except Exception as exc: 334 error[0] = sys.exc_info() /srv/conda/envs/notebook/lib/python3.7/site-packages/tornado/gen.py in run(self) 733 734 try: --> 735 value = future.result() 736 except Exception: 737 exc_info = sys.exc_info() /srv/conda/envs/notebook/lib/python3.7/site-packages/dask_kubernetes/core.py in _start(self) 499 "docstring for ways to specify workers" 500 ) --> 501 raise ValueError(msg) 502 503 base_pod_template = self.pod_template ValueError: Worker pod specification not provided. See KubeCluster docstring for ways to specify workers ```

With the pangeo-notebook meta package approach, should I be instantiating KubeCluster() differently? Cc @jhamman

scottyhq commented 4 years ago

@andersy005 - the metapackage does not install dask_config.yaml which you still need. Things worked for you previously because we've been including that file in pangeo base images. So

1) you could continue using a base image that has it (see https://github.com/pangeo-data/pangeo-stacks-dev/blob/master/base-image/dask_config.yml). I'm going to try to finalize this pangeo-stacks rework today - here is a binder config that works using the metapackage https://github.com/scottyhq/pangeodev-binder/tree/master/binder.

2) Otherwise you can copy your own dask_config.yaml into the image as is done here https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/icesat2/image/binder

andersy005 commented 4 years ago

Thank you for the clarification, @scottyhq! I previously misunderstood how the meta-package was intended to work. I am going to use the base image option.

andersy005 commented 4 years ago

Superseded by #51