2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
105 stars 64 forks source link

[Request deployment] New Hub: Smithsonian #2323

Closed jmunroe closed 1 year ago

jmunroe commented 1 year ago

Important dates

Hub Authentication Type

GitHub (e.g., @mygithubhandle)

First Hub Administrators

Mike Trizna, @MikeTrizna Rebecca Dikow, @rdikow Alex White, @aewhite100

[GitHub Auth only] How would you like to manage your users?

Allowing members of a specific GitHub organization

[GitHub Teams Auth only] Profile restriction based on team membership

Organization: @Smithsonian

(no teams yet -- may be added in the future)

Hub logo image URL

https://logo.si.edu/wp-content/uploads/2018/07/logo_primary.svg

Hub logo website URL

https://www.si.edu/

Hub user image GitHub repository

TBD

Hub user image tag and name

TBD

Extra features you'd like to enable

(Optional) Preferred cloud provider

AWS

(Optional) Billing and Cloud account

None

Other relevant information to the features above

Use smithsonian.2i2c.cloud as URL

Please deploy to us-east-2

GPUs are not required today, but may be in the future if that affect choice of zone.

For now, only use a machine type 4 CPU/32 GB.

If additional information is required, please ping @jmunroe first.

Tasks to deploy the hub

consideRatio commented 1 year ago

The JupypyterHub is setup and ready for preview

Hi @MikeTrizna @rdikow @aewhite100! I'm with 2i2c and have setup your JupyterHub installation. I have some followup actions (1), a question (2), and some initial information (3, 4) below.

  1. Help to finalize GitHub based authorization

    I've configured that members of the https://github.com/smithsonian GitHub organization are authorized access to three locations:

    For the authorization to work, a admin of the https://github.com/smithsonian GitHub organization needs to accept that members give out information if they belong to the organization. If either of you is an admin of the smithsonian GitHub organization, please attempt to login at each of the websites above and press the "Grant" button next to the smithsonian organization when presented with a request like this: authorize-github-org

    In the case none of you are admins, please visit the websites and press a "Request" button next to the smithsonian github organization. Then, contact someone that is a GitHub Organization admin and ask them to approve the OAuth Application requests for: smithsonian-grafana, smithsonian-staging, and smithsonian-prod as documented by GitHub here. Following this, you and other members of the smithsonian GitHub organization should successfully be able to login!

  2. Is Grafana dashboard access acceptable?

    I've granted access to https://grafana.smithsonian.2i2c.cloud for all members of the smithsonian GitHub organization. It provides dashboards of information such as memory use, total number of users, etc. Let me know if this is acceptable, or if you wish to restrict access of that to only a few users. I recommend granting everyone access as it can be used to help individual users learn about their memory use etc. The most sensitive information is perhaps to see if other users have been active or not and during what times.

    If you wish to restrict the access to a few users, please provide me with emails to grant access to. Either directly here or by emailing support@2i2c.org.

  3. About the initial user environment

    Right now, users starting servers via JupyterHub at https://smithsonian.2i2c.cloud will use the quay.io/pangeo/pangeo-notebook:2023.02.27 Docker image. If you wish, you can build your own user image, for example by using the template in https://github.com/2i2c-org/hub-user-image-template.

  4. About users sharing nodes (machines)

    Right now, all users will start their servers (docker image containers) on a 4 CPU node (machine) with up to 32 GB of memory. In other words, multiple users can start on the same machine. Each user is by default guaranteed a 1GB share of memory and won't risk experience memory issues if they stay below that. The less memory requested, the more users can fit on the node, the cheaper the cloud bill.

    A users can opt to request more guaranteed memory though, and should if they find themselves using more to avoid issues. And you as community representatives can communicate with us at 2i2c if you wish for the default to change, from for example 1GB to something else. This is how it can look when users start their servers.

    image

    If a user has reqested some amount of memory, but is in practice using more, users may be kicked off the node if it starts running out of memory. I suggest that you check how much memory you and other users are using overall. In JupyterLab you can see how much memory is used in the button left statusbar:

    image

    Using grafana, and specifically this grafana dashboard, you can learn how memory and CPU use change over time and such as well to make better informed choices. Below is an example of me starting a user server over a few minutes.

    image

consideRatio commented 1 year ago

Hi @MikeTrizna @rdikow @aewhite100, have you had time to trial the jupyterhub setup?

It would be good to know that 1 and 2 in https://github.com/2i2c-org/infrastructure/issues/2323#issuecomment-1505721800 is resolved before my vacation starting April 29th.

MikeTrizna commented 1 year ago

Thanks so much, @consideRatio for setting this up!

Regarding question 1 above, I am an admin of the Smithsonian organization, but it looks like it is already configured to provide membership access. See the screenshot below.

image

We are able to use GitHub logins to access https://staging.smithsonian.2i2c.cloud/ and https://smithsonian.2i2c.cloud/, but not https://grafana.smithsonian.2i2c.cloud/. That one gives the following error:

image

aewhite100 commented 1 year ago

Hi @consideRatio sorry for the delay. Very exciting to see the new setup! Yes we've been able to do some basic exploration and are happy to see everything up and running - 1 from above is working. The grafana dashboard is not accessible as @MikeTrizna mentioned.

Once we get those issues fixed, we think it would be best to restrict access to the grafana dashboard to us three for now. whiteae@si.edu, triznam@si.edu, and dikowr@si.edu.

Thanks for the help we hope to have more time to play with more custom user environments, especially those including RStudio and GIS capabilities. All of the other documentation you provided in 3 and 4 is very useful! We will have more time to mess with multiple users once we have our colleagues take a look.

consideRatio commented 1 year ago

Thank you for the follow up!

Mike the green checkmark you observed indicates the "Press Grant button" step is complete, excellent!

Grafana access

@MikeTrizna ah the grafana authentication issue stemmed from the Smithsonian github organization was registered with a capital letter, and Grafana has a bug by being case sensitive when reading configuration about allowed_organizations. I reported this in https://github.com/grafana/grafana/issues/66876 and submitted a fix for it in https://github.com/grafana/grafana/pull/66879 :)

Based on feedback from @aewhite100 above, I've now disabled the GitHub based authentication and sent a grafana username/password invite to the emails: whiteae@si.edu, triznam@si.edu, and dikowr@si.edu.


I see the initial setup of the JupyterHub for Smithsonian as complete now. Is there something else you wish to see done for this hub as part of the initial setup @MikeTrizna @aewhite100 @rdikow?

damianavila commented 1 year ago

I see the initial setup of the JupyterHub for Smithsonian as complete now.

I will call this one done by now, please submit tickets through support if you need further changes. Thanks!