2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
105 stars 64 forks source link

[New Hub] Alabama Water Institute CIROH hub #1444

Closed colliand closed 2 years ago

colliand commented 2 years ago

Hub Description

The Alabama Water Institute (AWI) is convening a consortium of 28 university partners to improve water management for the USA. The announcement of the award to support the collaboration called CIROH is available here.

2i2c has been engaged to provide interactive computing service supporting this collaboration.

The service will initially use GitHub auth using an allow list based on membership in AWI GitHub organization. As the service evolves, I anticipate we may move over to CIlogon.

Community Representative(s)

@jameshalgren

Important dates

Notes: dates are updated accordingly to new information and prioritization.

Hub Authentication Type

GitHub Authentication (e.g., @mygithubhandle)

Hub logo information

Hub user image

Extra features you'd like to enable

Other relevant information

Let's get started with a Pangeo-style Daskhub. The capacity of the team at AWI is increasing and a customized software environment will likely be ready later in the year.

I suggest this hub offer the VNC/Linux desktop feature.

This hub should be hosted on GCP in a data center that hosts the National Water Model Data.

Hub URL

ciroh.awi.2i2c.cloud

Hub Type

daskhub

Tasks to deploy the hub

colliand commented 2 years ago

Based on today's call with @jameshalgren, I suggest the following onboarding process. CIROH and AWI have ambitious plans so it's important we get the initial conditions right.

  1. @colliand will work with @jameshalgren, Stefanie O'Neil and colleagues from 2i2c and CS&S to establish the business relationship. The path suggested by AWI is that 2i2c/CS&S provide a contract with an attached statement of work document. The statement of work will be phased.
  2. The hub will be launched sometime in July and @jameshalgren (with inputs from the 2i2c team) will seed a shared directory with some sample notebooks.
  3. After the hub is up and running, I suggest that @fperez virtually meet with the AWI/CIROH team and provide a ~1h demo of how to use the platform for open science. Others on the 2i2c team (e.g. @colliand and @jmunroe) should assist with Fernando's demo and learn his tricks so that we can provide similar demos in the future.
damianavila commented 2 years ago

Suggested plan LGTM, @colliand. I added the request to the backlog board and we will find the eng resources so we can push forward on step 2 in a timely manner.

@jameshalgren, we will ping you soon with some questions about the specific of the hub deployment.

jameshalgren commented 2 years ago

Thanks @colliand, @damianavila. Processing, will respond soon.

jameshalgren commented 2 years ago

A few questions, possibly specialized, probably going beyond the scope of this issue. Tagging @colliand to ask for redirection or moderation if necessary.

jameshalgren commented 2 years ago

Tagging @whitelightning450 @karnesh for situational awareness.

colliand commented 2 years ago

Hi James! Yes 2i2c has experience with the real-time-collaboration features in upstream Jupyter. Experiments have shown that feature is not ready for production deployments. There is ongoing work there and 2i2c will support RTC when we can do so securely and robustly.

Yes, our team is contributing to the "tantalizing future" you referenced. The pioneering work of the Pangeo community is an inspiration for the founding of 2i2c. We are in the process of on-boarding a new team member @jmunroe who has technical and community experience with big data geosciences. I spent some time briefing him on CIROH/AWI today and expect he will be an excellent resource for our collaboration.

fperez commented 2 years ago

@jameshalgren I haven't seen (which doesn't mean they don't exist, obviously) examples of hubs tightly integrated with ODCs. But from a quick look at the ODC setup, I see a key element of this is having an accessible Postrgres server to manage the actual data catalogs and serving.

Coincidentally, as part of the Jupyter Meets the Earth effort, with @consideRatio and @yuvipanda we're looking right now at how to most cleanly set up a persistent, robust and cost-effective Postgres server that can be accessed by all the users of a Hub. We happen to need that for one of our research projects, and our current solution (via sqlite) is sub-optimal.

We'll be happy to share any progress we make on that front back with the rest of the team - just today I was discussing with @consideRatio how this was very likely to be a use case that many others would be likely to encounter. So I'm delighted to see that intuition confirmed by your needs, and it means it's all the more timely that we make progress on it :)

jmunroe commented 2 years ago

Assuming the National Water Model will be a key dataset used by this hub, I'll note a few other links

This is in additional to the NWM data store on GCP linked above.

I am interested in identifying other key datasets that the community will anticipating using on this hub to ensure it is being set up in a way that accessing that data is straight forward for users.

colliand commented 2 years ago

Thanks @jmunroe! I'll add @jameshalgren here in case he can share any other input on important data sets for the emerging CIROH community.

jameshalgren commented 2 years ago

Thanks @colliand and @jmunroe. I've jotted down a few thoughts/responses to launch the weekend:

Assuming the National Water Model will be a key dataset used by this hub, I'll note a few other links

It will be the key dataset used in this hub, together with observation data initially from USGS, but from any valid source.

  • About the The National Water Model from the Office of Water Prediction

    • Includes links to HTTP and FTP sites of the last two days output of the NWM.

I think it is http only at this point. There are ftp-versions using the LDM protocol for direct sharing of data between NWS offices, but that's probably not relevant here for the moment. FWIW, the NOMADS servers also host all of the NWS weather model output -- though the storage formats are far from optimal for cloud access, just like the NWM data.

There is a 1.2 GCP bucket of the same data (they use the label 'reanalysis' which is technically incorrect...). The AWS version of that data is more complete, with the 1.2, 2.0, and 2.1 versions of the retrospective data, along with experimental (?) versions with subsets of the data in zarr formats.

The GCP bucket mentioned is a superset of the S3 resource, with the analysis, short (on s3), medium, and long-range output. In fact, only a handful of specific derived products appear to be missing from the GCP bucket relative to what is available on the direct download from NOMADS.

Hopefully, some of what we make here can allow for Dr. Maidment's work to be more easily contributed back to the broader NWM community. He and his team were critical influencers in the initiation of the project and continue to generate great work!

This is in additional to the NWM data store on GCP linked above.

I am interested in identifying other key datasets that the community will anticipating using on this hub to ensure it is being set up in a way that accessing that data is straight forward for users.

I mentioned USGS data. There is a useful toolset for accessing USGS data and we may use that or replicate a portion into storage on the cloud backend. I am aware of a similar script by @groutr.

Those observed streamflow (which are really observed stream-stage data converted to estimated flow -- but the convention is to call them streamflow...) data will be the key initial dataset because they are the key output from the model . As we continue, additional variables will be examined and we will have to identify or create repositories of validation data to use for exploration.

damianavila commented 2 years ago

A few questions for all of you 😉

Let's get started with a Pangeo-style Daskhub. The capacity of the team at AWI is increasing and a customized software environment will likely be ready later in the year.

OK, so starting with the pangeo-notebook image is enough to start with, I presume. Can you confirm?

I suggest this hub offer the VNC/Linux desktop feature.

IIRC, @yuvipanda set up this feature for the Jack Eddy symposium.

This hub should be hosted on GCP in a data center that hosts the National Water Model Data.

Are we talking about a dedicated cluster here? Or are you OK with the hub being deployed in a shared cluster? (@colliand do you have any more information about this aspect from the lead process? Thanks!)

sgibson91 commented 2 years ago

I suggest this hub offer the VNC/Linux desktop feature.

IIRC, @yuvipanda set up this feature for the Jack Eddy symposium.

@yuvipanda it would be great if we could take this opportunity to document how to setup this feature in the hub features docs

consideRatio commented 2 years ago

I suggest this hub offer the VNC/Linux desktop feature.

IIRC, @yuvipanda set up this feature for the Jack Eddy symposium.

@yuvipanda it would be great if we could take this opportunity to document how to setup this feature in the hub features docs

For reference, I think this is solely something to setup in the user image. This is what JMTE has done to support this functionality.

  1. Install TurboVNC
  2. Install jupyterhub/jupyter-remote-desktop-proxy
  3. Install dependency: websockify

It is then represented as the "Desktop" icon in the JupyterLab launcher.

image

image

colliand commented 2 years ago

Yes, I suggest that the AWI/CIROH hub be set up on a dedicated GKE cluster on the data center where the NWM data is hosted. I suggest that 2i2c manage the billing account for the cluster with the monthly cloud usage costs passed through to AWI. AWI/CIROH may choose to take over the billing account as the service and their devops capacity expands.

I like the advice shared by @consideRatio ratio that we set this hub to resemble the JMTE hub. The suite of integrated tools in that hub is tuned to support collaborations like those envisioned by CIROH.

sgibson91 commented 2 years ago

I suggest that 2i2c manage the billing account for the cluster with the monthly cloud usage costs passed through to AWI. AWI/CIROH may choose to take over the billing account as the service and their devops capacity expands.

This sounds like we should create a new billing account and not just use the two-eye-two-see one, no?

P.S. It also looks like I don't manage the two-eye-two-see billing account, so I can't create a project attached to that one in the interim

Screenshot 2022-07-19 at 10 26 20
jameshalgren commented 2 years ago

I like the advice shared by @consideRatio ratio that we set this hub to resemble the JMTE hub. The suite of integrated tools in that hub is tuned to support collaborations like those envisioned by CIROH.

Link to that hub for reference?

jameshalgren commented 2 years ago

@whitelightning450, @hellkite500, @aaraney, @karnesh, @mgdenno -- have been meaning to loop you in here so you can follow the development here.

@quebbs -- hello! -- tagging you ahead of upcoming discussion. This may be a tool to put to use.

sgibson91 commented 2 years ago

Ok, I have created a new GCP account to deploy this into. I have connected the 2i2c billing account for now, and we can decide to change that later if needed.

(Big gold star ⭐ to Chris for figuring that out!)

sgibson91 commented 2 years ago

on the data center where the NWM data is hosted

Can we be a bit more specific about this please? The NWM data is multi-regional in the US: so is us-central1-b ok? Do we envision this hub wanting to use GPUs in the future (then we should go with us-central1-c)?

damianavila commented 2 years ago

Do we envision this hub wanting to use GPUs in the future (then we should go with us-central1-c)?

@colliand was that piece part of the conversation? @jameshalgren, any input about this one?

jameshalgren commented 2 years ago

I think we want to avoid f-35 syndrome. Let me check with a couple of others but I think we can do plenty with GPCPUs for now.

Having the option in the future might be useful. What are the trade-offs for going to the data center where GPUs are available?

sgibson91 commented 2 years ago

Having the option in the future might be useful. What are the trade-offs for going to the data center where GPUs are available?

As far as I'm aware, none. We often put research hubs in that zone in case they want to upgrade to GPUs later, since moving the cluster after the fact would involve destroying it and redeploying.

consideRatio commented 2 years ago

I like the advice shared by @consideRatio ratio that we set this hub to resemble the JMTE hub. The suite of integrated tools in that hub is tuned to support collaborations like those envisioned by CIROH.

Link to that hub for reference?

@jmunroe here is links to JMTE hub:

I like the advice shared by @consideRatio ratio that we set this hub to resemble the JMTE hub.

Note that I meant only to describe how to setup the ability to access a remote desktop interface, which is something that can be configured in the user environment like described here.

A way to setup the hub to have this kind of functionality would be to bootstrap the hub with a image with such parts.

colliand commented 2 years ago

Hi @jameshalgren. This deck created by @fperez describes some of the features of the Jupyter Meets the Earth (JMTE) hub. This is an opinionated curated integrated "batteries included" deployment that goes beyond the (already awesome) JupyterLab. After the CIROH/AWI hub is launched, I look forward to working with you to organize a kickoff event for you and your community champions in which Fernando gives a demonstration.

In an exchange elsewhere, I learned from @consideRatio that the JMTE hub has the following extra features:

Below are are some non-2i2c-default features used within the JMTE hub.

sgibson91 commented 2 years ago

@jameshalgren Can you please provide the list of GitHub Teams you would like to have access to the hub?

sgibson91 commented 2 years ago

I am struggling to install TurboVNC with the provided code snippet and receiving the following error:

  E: Invalid archive signature
  E: Internal error, could not locate member control.tar{.zst,.lz4,.gz,.xz,.bz2,.lzma,}
  E: Could not read meta data from /home/jovyan/turbovnc.deb
  E: The package lists or status file could not be parsed or opened.

PR: https://github.com/2i2c-org/awi-ciroh-image/pull/1

consideRatio commented 2 years ago

@sgibson91 seems like you have the exact same code snippet and a similar base image as in https://github.com/pangeo-data/jupyter-earth/blob/master/hub.jupytearth.org-image/Dockerfile. So, maybe the apt install step crashes because of something missing, such as build-essential?

Hmmm, googling on the errors, I see notes about apt clean etc. Also, I note that you have a step before using apt update that didn't end with a cleanup step. Maybe that could help? This is a wild guess without motivation.

sgibson91 commented 2 years ago

Thanks @consideRatio. I added the clean-up step to the earlier apt update invocation, and that produced a new error related to "held broken packages". So I added an apt update and the clean-up step to the TurboVNC step and now it builds successfully 🤷🏻

Final commit looks like this: 2i2c-org/awi-ciroh-image@6d4f05c (#1)

jameshalgren commented 2 years ago

@jameshalgren Can you please provide the list of GitHub Teams you would like to have access to the hub?

@sgibson91 -- alabamawaterinstitute, please, and NOAA-OWP

Thanks!

jameshalgren commented 2 years ago

@colliand

This deck

... that is a link to a NASA ICESat-2 Hackweek quote.

sgibson91 commented 2 years ago

alabamawaterinstitute, please, and NOAA-OWP

These are organizations, I was under the impression you wanted specific teams to have access? E.g. the tech-team that is a member of the 2i2c org -> https://github.com/orgs/2i2c-org/teams/tech-team

sgibson91 commented 2 years ago

Ah pardon me, I think I'm misremembering another hub setup issue where a question was raised about subteams

jameshalgren commented 2 years ago

Ah pardon me, I think I'm misremembering another hub setup issue where a question was raised about subteams @sgibson91 10-4 -- we may refine later, but I'm assuming that is a simple process.

sgibson91 commented 2 years ago

I'm assuming that is a simple process

Absolutely.

The hubs are available here:

Please note these docs about authorising the GitHub app for the first time: https://infrastructure.2i2c.org/en/latest/howto/configure/auth-management.html#follow-up-github-organization-administrators-must-grant-access

sgibson91 commented 2 years ago

@consideRatio are there any other setup steps regarding the VNC/Linux desktop? I would've expected a button on the Lab Launcher saying "Desktop", but it's not there. Also changing /lab to /desktop in the URL returns a 404 😕 Maybe @yuvipanda can help too?

Image repo: https://github.com/2i2c-org/awi-ciroh-image

consideRatio commented 2 years ago

This is what is done in the JMTE image, which is based on a pangeo-notebook base image: https://github.com/2i2c-org/infrastructure/issues/1444#issuecomment-1187405324. I don't think anything else is needed!

sgibson91 commented 2 years ago

🤔 Hmmm ok, maybe Yuvi can help me debug when he's online then

consideRatio commented 2 years ago

@sgibson91 I would suspect https://github.com/2i2c-org/awi-ciroh-image/commit/7b080bef9a29e7d791e62058229f1946812f403a#diff-dd2c0eb6ea5cfc6c4bd4eac30934e2d5746747af48fef6da689e85b752f39557R32-R33 could be to blame. I don't understand how jupyter-server-proxy registers things to show up in jupyterlab and start up properly, but jupyterlab presents icons for notebook / kernels etc, and maybe there is a common mechanism in play related to removing nb_conda_kernels.

Hmmm, thinking about it, if you don't succeed in accessing /user/some-name/desktop, it makes me think that jupyter-server-proxy has failed to start. That I know from experience can happen if some other jupyter-server-proxy package fails to load properly. So, something else registering itself with jupyter-server-proxy may be to blame.

sgibson91 commented 2 years ago

Yeah, tbh, I'm just guessing and used https://github.com/2i2c-org/coessing-image/blob/main/Dockerfile as a starting point (before the Julia addition :D)

jameshalgren commented 2 years ago

The hubs are available here:

Awesome! Does this mean we can get in a start trying things out (I assume this will begin to incur cloud costs...)?

sgibson91 commented 2 years ago

@jameshalgren yes and yes :) I'm still trying to figure out the VNC/Linux desktop feature though

sgibson91 commented 2 years ago

I made some progress in PR https://github.com/2i2c-org/awi-ciroh-image/pull/3 I now have the Desktop icon on JupyterLab's launcher (I'm testing this on the staging hub).

However when I click on it, I see "Something went wrong, connection is closed"

Logs from my user server (k logs jupyter-sgibson91) show:

[I 2022-07-25 16:05:12.314 SingleUserNotebookApp handlers:432] Trying to establish websocket connection to ws://localhost:5901/websockify
2022-07-25 16:05:12,316 - SingleUserNotebookApp - ERROR - Uncaught exception GET /user/sgibson91/desktop/websockify (10.128.0.3)
HTTPServerRequest(protocol='https', host='staging.ciroh.awi.2i2c.cloud', method='GET', uri='/user/sgibson91/desktop/websockify', version='HTTP/1.1', remote_ip='10.128.0.3')
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/tcpclient.py", line 138, in on_connect_done
    stream = future.result()
tornado.iostream.StreamClosedError: Stream is closed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/websocket.py", line 956, in _accept_connection
    await open_result
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/jupyter_server_proxy/handlers.py", line 672, in open
    return await super().open(self.port, path)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/jupyter_server_proxy/handlers.py", line 494, in open
    return await self.proxy_open('localhost', port, proxied_path)
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/jupyter_server_proxy/handlers.py", line 444, in proxy_open
    await start_websocket_connection()
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/jupyter_server_proxy/handlers.py", line 435, in start_websocket_connection
    self.ws = await pingable_ws_connect(request=request,
  File "/srv/conda/envs/notebook/lib/python3.9/asyncio/tasks.py", line 328, in __wakeup
    future.result()
  File "/srv/conda/envs/notebook/lib/python3.9/site-packages/tornado/iostream.py", line 1205, in connect
    self.socket.connect(address)
OSError: [Errno 99] Cannot assign requested address
sgibson91 commented 2 years ago

@GeorgianaElena suggested some missing packages in https://github.com/2i2c-org/awi-ciroh-image/pull/3#pullrequestreview-1050597105 and now the desktop feature is available!

colliand commented 2 years ago

Thanks @jameshalgren. I fixed the link to point to the intended slide deck created by Fernando.

colliand commented 2 years ago

Now that the production and staging hubs are available, I suggest to @jameshalgren that we organize a kickoff event for CIROH personnel who will manage the hub with @jmunroe @fperez (and perhaps others on the 2i2c team). Perhaps we can link up for a phone call to discuss some launch planning?

jameshalgren commented 2 years ago

@colliand -- targeting 23 August for a technically focused demo.

damianavila commented 2 years ago

I think we can close this issue (new hub set up) by now (since I believe it is completed) and continue the conversation on new issues.

jameshalgren commented 2 years ago

Thanks @damianavila -- new issues (as needed) are still posted under this repository, correct? (and, for that matter, thanks @colliand, @sgibson91, @consideRatio, @fperez, and @jmunroe and all the rest -- we're excited!)

damianavila commented 2 years ago

@jameshalgren, for follow-up questions/requests I would suggest using our support email channel. Over there, we will be able to provide useful feedback and, in some cases, open issues in specific repositories accordingly to the topic you are rising in that conversation.