2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 57 forks source link

[New Hub] Jack Eddy Symposium #1329

Closed damianavila closed 2 years ago

damianavila commented 2 years ago

Hub Description

The request is for the launch or use of a research hub with Dask. (Daniel Marsh, one of the co-organizers, wishes to share tutorial notebooks. He plans to use "intake-esm to access all the CMIP6 climate runs which are hosted in zarr format on aws. We can pull down all the time series data and regress it against the solar forcing used to derive the solar response."

Community Representative(s)

@colliand, can you give us the contact information for the community representatives. Looking at the lead issue, it seems the contacts would be Daniel Marsh and/or Ryan McGranaghan. If that is the case, do you have their contact information? Any GitHub handles?

Important dates

Hub Authentication Type

Other (may not be possible, please specify in comments)

Hub logo information

Hub user image

Extra features you'd like to enable

Other relevant information

I presume GitHub auth would be OK but we need to confirm it. It seems they might need to interact with datasets on AWS, so it might make sense to deploy in that cloud provider... From the lead description, it is not clear to me how this hub will be paid so there is some stuff to figure it out in case we need a new AWS land to deploy into...

Hub URL

jackeddy.2i2c.cloud

Hub Type

daskhub

Tasks to deploy the hub

colliand commented 2 years ago

Yes! I invite @rmcgranaghan and @dan800 to collaborate with us in establishing this hub for the Eddy Symposium. Please note information requests in the anchor entry of this issue.

(While searching for Dan, I found this repo in his collection: https://github.com/dan800/intake-esm.)

dan800 commented 2 years ago

Thanks for the invite! Essentially intake-esm is provide a data access layer allowing you to pick a model, experiment and variable. Regarding using it to access datasets on AWS, this was the example I found most useful:

https://github.com/hdrake/cmip6-temperature-demo/blob/master/notebooks/01_calculate_ECS_Gregory_method.ipynb

It's pulling down the surface temperature and radiative balance for the CMIP models for 4xCO2 experiments to calculate climate sensitivity. I haven't run it in a while, but the catalog is here: https://storage.googleapis.com/cmip6/cmip6-zarr-consolidated-stores.csv

For our project, the 'hist-sol' runs (solar variability only) might be the first place to look.

colliand commented 2 years ago

Hi @damianavila! There appear to be two distinct usages of the word image in the information request. The Hub logo image might be a .jpg or .png file. The Hub user image refers to a Docker image (see Jupyter Docker Stacks for more information).

Unless @dan800 or @rmcgranaghan intervene with other advice, I suggest that we use the image below for the hub logo and that the 2i2c team choose the Docker image for a Daskhub similar to those used by the Pangeo community.

eddy-symposium

damianavila commented 2 years ago

@dan800 is there any preference in the cloud provider? From the top description, it seems you needed to access some data in AWS land, but in your most recent update, it seems the data lives in GCP. Are you OK with the hub deployed on GCP land?

cc @yuvipanda, this might be relevant to decide where we deploy the hub.

dan800 commented 2 years ago

@damianavila No preference - I may have had it wrong and it's GCP! Please choose whatever you think is most efficient/cost effective.

colliand commented 2 years ago

I enjoyed an excellent Zoom call with Ryan and Dan earlier today. We exchanged a lot of information! Updates that I hope will be helpful to the 2i2c team are shared next.

Heliophysics software environment for Eddy Symposium?

Ryan pointed me to this excellent talk by Brian Thomas of NASA on HelioCloud(starting at 22:28). The talk describes a vision for collaborative and accelerated research for the heliophysics community by making the curated tools and data more easily accessible. (Thanks @brianthomas for the talk and for openly sharing your work! It would be great to link up with you to see if we can collaborate.)

Brian's talk pointed at the following resources:

(These images were cloned and modified from pangeo-data/pangeo-docker-images project. FYI @rabernat. )

Before learning about HelioCloud, we (Ryan McGranaghan, Dan Marsh, @fperez and me) planned to deploy a DaskHub with a Pangeo-style software environment for usage at the upcoming Jack Eddy Symposium with future plans to customize the environment to better support heliophysics research teams. HelioCloud may allow us to advance faster! While there may be too little time to get this all in place, I ask @yuvipanda anyway... can you deploy a DaskHub with the HelioCloud environment for us?

Access control for the Symposium?

Ryan McGranaghan created the Jack Eddy Symposium GitHub Organization. Dan, Ryan and others will use this structure to host some repositories with notebooks to be used at the Symposium. We'd also like to use this organization to define the allow-list of users who should be authorized to access the the hub at jackeddy.2i2c.cloud. The content will be shared during the Symposium using nbgitpuller.

yuvipanda commented 2 years ago

Ready for testing at https://jackeddy.2i2c.cloud/! Access control is set up so that anyone part of the https://github.com/jack-eddy-symposium organization can log in. https://github.com/yuvipanda/gh-scoped-creds/ is set up for secure pushing to GitHub (see this blog post for details). There's also a shared/ drive, with write access for admins. And a SCRATCH_BUCKET for users to temporarily store stuff in object storage. https://docs.2i2c.org/en/latest/user/storage.html has more information on these features.

dask-gateway is also set up.

In short, this is fully set up as a Pangeo environment.

I tried to use the image https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py, but unfortunately it failed. I'm not sure why - it needs a bit of a deeper investigation. I'll try to do that soon, but any information about wether those images are currently being used in other hubs will be helpful. I also can't create an account on the gitlab instance (https://git.mysmce.com/heliocloud/heliocloud-docker-images) where the image is hosted, so not sure how to contribute.

I've set a 1G memory limit, a 1 CPU guarantee / 2 CPU limit. We can tweak these as needed.

@colliand @dan800 please try it out and let me know what needs to change!

yuvipanda commented 2 years ago

We can also prewarm the cluster before the symposium starts so users can get on much quicker.

colliand commented 2 years ago

Thanks @yuvipanda. The hub does not appear to complete the spawning process. Here is the error message I received:

2022-05-26T19:55:12Z [Normal] pod didn't trigger scale-up: 1 node(s) had taint {k8s.dask.org_dedicated: worker}, that the pod didn't tolerate, 1 node(s) didn't match Pod's node affinity/selector, 1 in backoff after failed scale-up
Event log
Server requested
2022-05-26T19:55:09Z [Warning] 0/3 nodes are available: 3 node(s) didn't match Pod's node affinity/selector.
2022-05-26T19:55:12Z [Normal] pod didn't trigger scale-up: 1 node(s) had taint {k8s.dask.org_dedicated: worker}, that the pod didn't tolerate, 1 node(s) didn't match Pod's node affinity/selector, 1 in backoff after failed scale-up
dan800 commented 2 years ago

Thanks @yuvipanda. The hub does not appear to complete the spawning process. Here is the error message I received:

Same for me.

yuvipanda commented 2 years ago

Looking in the console, I see:

image

Investigating...

yuvipanda commented 2 years ago

@colliand @dan800 based on my reading of https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-vm-creation#resource_availability, the cloud was just full for a bit! It's working again now.

colliand commented 2 years ago

Wow! Thanks Yuvi! I see Jovian moons orbiting the logo....and there's the lab interface. Merci!

dan800 commented 2 years ago

Hello both. Thanks for working on this after hours. So I was able to access the server and tried a very simple notebook. It crashed whan I tried to access a slice of CMIP6 model output - I think because it takes up 3 or 4 GB of RAM. Just reading the CMIP6 catalog does use a large amount of memory. The notebook is here: shared-readwrite/intake_example.ipynb Probably down to me not knowing enough to chunk the data efficiently.

yuvipanda commented 2 years ago

@dan800 ok, I've setup a new nodepool with higher resources for this. You should have access to about 24G of RAM and about 4 CPUs now. Give it a shot?

yuvipanda commented 2 years ago

We can tone down the resource requests for the actual event if necessary.

dan800 commented 2 years ago

@yuvipanda Great, that works! Thanks for getting this working. The memory usage was around 1.25 GB. This simple example extracted a 165 years of monthly mean temperatures at a particular location in the atmosphere from a climate model simulation. It's a good starting point for apply time series analysis.

rmcgranaghan commented 2 years ago

Brilliant! Thank you for the excellent work @yuvipanda. I'm able to spin up an instance and am creating material now. I've added a directory for tutorials and populated with a README markdown file and some useful starter notebooks. Please add any helpful tutorials there and update the README as you do. This will likely be the starting place for the participants, so we want it to be instructive

brianthomas commented 2 years ago

Hi all,

I can add folks to our gitlab to contribute (that would be awesome). I'd be happy to work with you all to figure out why the container crashed. @yuvipanda please contact @rmcgranaghan by email to get my email and contact me. I'll then be able to add you to our gitlab.

-brian

I tried to use the image https://gallery.ecr.aws/q3h7b4o8/helio-notebook-py, but unfortunately it failed. I'm not sure why - it needs a bit of a deeper investigation. I'll try to do that soon, but any information about wether those images are currently being used in other hubs will be helpful. I also can't create an account on the gitlab instance (https://git.mysmce.com/heliocloud/heliocloud-docker-images) where the image is hosted, so not sure how to contribute.

I've set a 1G memory limit, a 1 CPU guarantee / 2 CPU limit. We can tweak these as needed.

@colliand @dan800 please try it out and let me know what needs to change!

colliand commented 2 years ago

I've shared Yuvi's email with Ryan to complete the link back to Brian. Thanks to all involved for sharing expertise and contributing toward the success of the Eddy Symposium!

fperez commented 2 years ago

Thx a lot @yuvipanda! Quick q - did you base the image on the docker/apt/env.yml/etc files they had for their heliocloud, but moved over to one of our repos?

I'd love as a goal of this event to harmonize further how we all do this type of thing even further. Right now we all do very similar things (JMTE hubs, Berkeley ones, 2i2c ones, HelioCloud, ...) but with ever so slight tweaks in the workflow (where some files go, workflow for updates, etc). I think there's an opportunity to adopt some common, more standardized practices on this.

I'd be happy to take some notes together with @rmcgranaghan @brianthomas @colliand et al in Vail on this, linking up with the 2i2c team as needed (but without imposing on any of you our CO schedule).

fperez commented 2 years ago

Actually, question: where is the set of config files now for this particular hub, in case I want to suggest any other tools/updates?

fperez commented 2 years ago

@yuvipanda - you know what I'm going to ask for :) Basically the same s159/JMTE toy set - VNC, show hidden files, extensiosn like git & jupyterlab-favorites, url proxy support, syncthing, node selector on landing, etc.

My talk will be about "living la vida nube" in s159/jmte with a uniform workflow, would be fantastic for the attendees to have access to the same set of default toys that I used this semester to achieve this smooth environment in teaching and research.

I'm also talking at the EarthCube meeting the week immediately after - same story. I want to use these events as an opportunity to streamline these patterns as much as possible.

Thanks for all the work folks!!

yuvipanda commented 2 years ago

@fperez so this currently uses the default pangeo image, which now already has gh-scoped-creds. https://git.mysmce.com/heliocloud/heliocloud-docker-images is the possible image that is going to be used - although it's currently failing and needs debugging. So the current image used is just https://github.com/pangeo-data/pangeo-docker-images/tree/master/pangeo-notebook. Config is at https://github.com/2i2c-org/infrastructure/pull/1337

So if I hear this currently, things you want in the image used would be:

  1. VNC
  2. jupyterlab-git
  3. jupyterlab-favorites
  4. jupyter-server-proxy (already installed)
  5. syncthing
  6. profile_list offering multiple size options
  7. show hidden files on the hub (this is a config options, I'll just turn that on)

I think some of these (perhaps jupyterlab-git?) can just go in the default pangeo image, while others probably need some custom image made for this event.

fperez commented 2 years ago

Good summary, yes @yuvipanda! BTW - I don't think all those things are critical for this week, it's rather that I'm settling into a good pattern on how to use these hubs as a reliable resource in the background acrosss deployments, and partly that requires having a solid default functionality set.

So I'd like this workshop to be an experiment in pitching that pattern to a new community, and as a way for us to iterate on it towards common functionality that we expose as a default (and I'd like to see this harmonized with the campus workflow too, of course).

But if any of it creates headaches for the team right now, zero worries - it's not urgent and I can always discuss these features off the campus/JMTE hub.

yuvipanda commented 2 years ago

I've now added a 'profile list' option:

image

We can restrict access to different image sizes based on GitHub team membership as well, so that's something we can show off too.

yuvipanda commented 2 years ago

@fperez and https://github.com/pangeo-data/pangeo-docker-images/pull/337 adds jupyterlab-git to the base pangeo images.

damianavila commented 2 years ago

Good summary, yes @yuvipanda! BTW - I don't think all those things are critical for this week, it's rather that I'm settling into a good pattern on how to use these hubs as a reliable resource in the background acrosss deployments, and partly that requires having a solid default functionality set.

Can we capture all the remaining things in another issue? I think it is important to split the "MVP" (provided by this issue) from the "nice to have". Otherwise, we are going behind a rabbit we can't catch because it keeps accelerating 😜.

fperez commented 2 years ago

Awesome, many thanks! As I mentioned, for the purposes of this particular workshop, I think we can stop wherever is reasonable for the team, even a basic hub image will be useful :)

I was hoping none of these would be too much of a burden purely based on the fact that they are features we already use in our other hubs (JMTE/campus). But I don't mean to put any undue burden on the team, particularly on short notice.

I do hope we'll learn from this experience how to streamline the process for a rich out-of-the-box base experience so that it's always ready to go without requiring extra effort. Discussions during the meeting (and some data ocllection as I suggested in #1360) will hopefully inform us better for future decisions.

yuvipanda commented 2 years ago

@fperez ask and you shall receive!

image

This is a new image (https://github.com/2i2c-org/jackeddy-image) based off the heliocloud image maintained at https://git.mysmce.com/heliocloud/. So it has everything the heliocloud image has (which in turn has everything pangeo has) + linux desktop + syncthing + gh-scoped-creds + jupyterlab-favorites. I'll also turn on showing all hidden files shortly.

I think this is everything you asked for, @fperez. LMK if I missed something.

@colliand by basing this off the heliocloud image, we're matching the environment they have. They seem to have added a bunch of fortran related stuff that we get for free now.

I think now it'd be great for folks (maybe @dan800?) to run through various test notebooks they wanna and let me know if any more changes need to be made.

fperez commented 2 years ago

Huge, huge thanks @yuvipanda! I'm testing it now, just wanted to let you know that during startup the log showed this:

2022-06-02T16:02:05Z [Warning] Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "05a42728fbfb6786d10d7d633c44fd83a9be7e2a3afdc047336f061d49b8b1e8": stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

It did continue afterwards, but I figured I'd mention it.

I hope this wasn't too much work, and that we can make this a default part of our setup in general, b/c I think those features put together make for a really lovely MVP for "living la vida nube."

Again, my infinite gratitude to the team - it's amazing to have the joy of working with folks like you all!! ❤️

damianavila commented 2 years ago

I think now it'd be great for folks (maybe @dan800?) to run through various test notebooks they wanna and let me know if any more changes need to be made.

Btw, I would like to close this one after this test is confirmed so we can open a specific "event" issue to track the event.

damianavila commented 2 years ago

"living la vida nube"

I love that phrase, @fperez! 😉

fperez commented 2 years ago

My only suggestion, at least for next week, would be, if it's not too much work, to keep some warm pods around - startup time was pretty long, and if next week we want folks to use this regularly, it does make a big difference if the wait is 30s vs ~5+ minutes.

But if it's either expensive or a lot of work, no worries - we can tell attendees to get a ☕ in the meantime :)

fperez commented 2 years ago

I'm old enough to have grown up with Ricky Martin being a staple of my teenage friends' love obsessions in 🇨🇴, time to honor him a bit now that I don't have teen love anxiety ;) (@damianavila - ref is to his Living la vida Loca song, that I actually don't really know well, but was very popular a while back in Latin America)

damianavila commented 2 years ago

My only suggestion, at least for next week, would be, if it's not too much work, to keep some warm pods around

We should put that request in the event issue once it is created.

(@damianavila - ref is to his Living la vida Loca song, that I actually don't really know well, but was very popular a while back in Latin America)

Yeah, I remember that one 😉.

fperez commented 2 years ago

Quick q @yuvipanda - am I doing something wrong?


---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 1>()
----> 1 import gh_scoped_creds
      2 get_ipython().run_line_magic('ghscopedcreds', '')

ModuleNotFoundError: No module named 'gh_scoped_creds'
yuvipanda commented 2 years ago

@fperez i hadn't updated to latest image (which will happen once https://github.com/2i2c-org/infrastructure/pull/1383 is merged). I've just updated the image with the configurator for now.

colliand commented 2 years ago

Because I love it and want to use it, is RISE available in the Jack Eddy Symposium hub? When I render a .ipynb file in classic notebook format on this hub, I don't find the user interface for managing RISE slides. This request is also a nice opportunity for me to thank @damianavila for his work on RISE! 🎉

dan800 commented 2 years ago

RISE looks pretty cool!

dan800 commented 2 years ago

@yuvipanda Has something changed? I don't seem to have access to intake now:


ModuleNotFoundError Traceback (most recent call last) Input In [1], in <cell line: 5>() 3 import pandas as pd 4 import xarray as xr ----> 5 import intake

ModuleNotFoundError: No module named 'intake'

yuvipanda commented 2 years ago

@dan800 I used the heliocloud image (https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/tree/main/helio-notebook-py) as the base, and that doesn't seem to have intake. I'm adding it now.

yuvipanda commented 2 years ago

Done in 8d93017776082c1ea834096e1e72fd0b5dfb78c4, will update after the build is complete

yuvipanda commented 2 years ago

I can base it off the pangeo image instead of the heliocloud image if you want too

yuvipanda commented 2 years ago

@dan800 I added intake https://github.com/2i2c-org/jackeddy-image/blob/main/environment.yml, on top of the base in https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/blob/main/helio-notebook-py/environment.yml. https://github.com/pangeo-data/pangeo-docker-images/blob/master/pangeo-notebook/environment.yml is the env file for pangeo. LMK if you want me to just add specific packages to our image, or base it off the pangeo image instead of the heliocloud image.

damianavila commented 2 years ago

We should put that request in the event issue once it is created.

FYI, I have created a dedicated issue for the event: https://github.com/2i2c-org/infrastructure/issues/1384

dan800 commented 2 years ago

@dan800 I used the heliocloud image (https://git.mysmce.com/heliocloud/heliocloud-docker-images/-/tree/main/helio-notebook-py) as the base, and that doesn't seem to have intake. I'm adding it now.

@yuvipanda Intake now loads - thanks! This may have already been flagged, but with the latest kernel I get the following error when trying to load the data from the cloud:

ImportError: Please install gcsfs to access Google Storage

fperez commented 2 years ago

@dan800 - I updated the configurator with the image that has gcsfs after adding it to the environment, so you should be now good to go on this!

fperez commented 2 years ago

On the other hand, I have a question for @yuvipanda - @colliand tried to add @rmcgranaghan as an admin in #1389, and I merged that, but something is not happy at the deploy stage. I'd be happy to try and fix the issue but I'm unfortunately beyond my comfort zone here, so I'll need a bit of input/help if possible.

yuvipanda commented 2 years ago

@fperez I see @rmcgranaghan is an admin now!