2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
105 stars 64 forks source link

[Request deployment] HHMI Spyglass Ephemeral Hub #3643

Closed colliand closed 9 months ago

colliand commented 9 months ago

The GitHub handle of the community representative

pending (@colliand for now)

Hub important dates

The template below does not exactly apply. This is a request for a publicly accessible binder designed (initially) to render notebooks in the Spyglass repository: https://github.com/LorenFrankLab/spyglass

Eventually, this hub will likely be used to serve repos generated by various other research groups affiliated with HHMI. The current focus is to deploy infrastructure to enable a "magic link" that interactively renders Spyglass tutorial content.

Hub Authentication Type

pending

First Hub Administrators

[GitHub Auth only] How would you like to manage your users?

None

[GitHub Teams Auth only] Profile restriction based on team membership

Profile should be set up to work for Spyglass.

Spyglass uses an adjacent database. This was briefly discussed in a call with @yuvipanda .

Hub logo image URL

should use same image on hhmi.2i2c.cloud hub which is not rendering right now;

HHMI-vertical-signature-color

Hub logo website URL

https://hhmi.org

Hub user image GitHub repository

to work with Spyglass for now. build on the fly via binder/GESIS tooling?

Hub user image tag and name

pending

Extra features you would like to enable

(Optional) Preferred cloud provider

None

(Optional) Preferred cloud region

as with hhmi.2i2c.cloud

(Optional) Billing and Cloud account

None

Other relevant information to the features above

No response

Tasks to deploy the hub

github-actions[bot] commented 9 months ago

Hey @pending (@colliand for now) and @colliand! šŸ‘‹ I noticed there is still pending information about the new hub deployment. Can you please help us fill it in?

The information pieces still missing, are: - hub authentication type
- hub user image tag and name
- extra features you would like to enable
- other relevant information to the features above

Details about each of them can be found in the top comment. But if you have questions about any of them, please ping the 2i2c/engineering team and they will help you.

After the form in the top comment is filled in, an engineer will be assigned and will start deploying the new hub šŸš€. Thank you!

jmunroe commented 9 months ago

There is overlap here with the request in

The immediate priority is the for a binder-like or nbgitpuller-like URL to render the notebooks in the Spyglass repository https://github.com/LorenFrankLab/spyglass using the environment defined in https://github.com/LorenFrankLab/hhmi-spyglass-image . For the preprint, the LorenFrank group may also created a more tailored repository of just the notebooks and configuration files and not include the Spyglass source code .

This group is about publish a preprint that will include a URL that needs to redirect and unauthenticated user to an ephemeral (no permanent storage) JupyterHub that also supports an ephemeral MySQL database in a sidecar container. @2i2c-org/engineering is request to provide guidance on how to achieve this behaviour.

HHMI represents several independent communities who will need to control their own environments. Rather than having those groups manage their own image environments, a GESIS-like binder build process could be deployed with persistent storage for authenticated users.

There is also value in having a more traditional BinderHub deployed for arbitrary code execution with no persistent storage. It is not clear to me if this needs to be unauthenticated users or could be wrapped behind some sort of light registration layer.

While the hub is discussing a 'HHMI Hub', i think it we may need to consider it more as a collection of hubs on the same HHMI cluster.

yuvipanda commented 9 months ago

There are two ways to approach this:

  1. Set this up as an ephemeral hub (documented in https://infrastructure.2i2c.org/howto/features/ephemeral/) as @jmunroe suggests
  2. Set this up instead as a binderhub, and use the allowed_repos feature of binderhub to restrict which one it can go to.

Given the timeline and the fact that we need to have some additional db (https://github.com/2i2c-org/infrastructure/issues/3624 sidecar), I think we should proceed with (1) (as documented in https://infrastructure.2i2c.org/howto/features/ephemeral/)).

I do think this ephemeral hub should be temporary, and we should switch it out to a binderhub soon. But this lets us do this piecemeal.

So tasks here are:

damianavila commented 9 months ago

OK, I have assigned this one to @sgibson91 who is going to deploy this one with @yuvipanda's assistance.

Let's start with the first step described by Yuvi.

@sgibson91, please voice any questions or blockers you may have in performing step 1.

@jmunroe, we need more details about the sidecar piece so we can start discussing the best way to make it happen given the ephemeral hub context.

Thanks!!

jmunroe commented 9 months ago

I agree that deploying an ephemeral hub in the HHMI cluster is a good initial step to solve the the needs of the Loren Frank group.

There is already some configuration in hhmi/common.values.yaml that can be pulled out into this new ephemeral hub on the same cluster.

For the image:

image: "quay.io/lorenlab/hhmi-spyglass-image:c307f9418a60"

For the mysql sidecar :

    singleuser:
      extraContainers:
        - name: mysql
          image: datajoint/mysql # following the spyglass tutorial at https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
          ports:
            - name: mysql
              containerPort: 3306
          resources:
            limits:
              # Best effort only. No more than 1 CPU, and if mysql uses more than 4G, restart it
              memory: 4Gi
              cpu: 1.0
            requests:
              # If we don't set requests, k8s sets requests == limits!
              # So we set something tiny
              memory: 64Mi
              cpu: 0.01
          env:
            # Configured using the env vars documented in https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
            - name: MYSQL_ROOT_PASSWORD
              value: "tutorial"

And I really like the idea of using nbgitpuller link . The desire from HHMI is "magic-link" that will direct readers to a environment where the notebooks are available and can be run. The specific details of binder or nbgitpuller are not important HHMI.

I will work with the LorenFrank group to ensure the nbgitpuller link works as intended.

sgibson91 commented 9 months ago

What should we call this hub? I think neither ephemeral.hhmi.2i2c.cloud nor binder.hhmi.2i2c.cloud are great fits?

jmunroe commented 9 months ago

I suggest spyglass.hhmi.2i2c.cloud

sgibson91 commented 9 months ago

If this will be publicly accessible, I guess we'd like a basehub, not a daskhub (as the staging and prod hhmi hubs are)?

jmunroe commented 9 months ago

Yes to basehub only for this deployment. I don't see any usage of daskhub in their content.

sgibson91 commented 9 months ago

Hub can be tested at https://spyglass.hhmi.2i2c.cloud

jmunroe commented 9 months ago

Thanks @sgibson91 !

Is this deployment following the description given in https://infrastructure.2i2c.org/howto/features/ephemeral/ ? If so, it is not matching some of my expectations (and those could just be issues with my expectations!):

  1. Landing page. Our documentations says "No home page is visible, so our home page customizations do not work." But I do see a landing page...(which can be useful -- I am just trying to understand if it should be there or not)

  2. Authentication 1. I see we are instead of tmpauthenticator we are going to use CILogon instead with allow_all selected. Is it possible to add additional oauth endpoints (e.g. Google, Microsoft) or having a GitHub account a requirement?

  3. Authentication 2. Is authentication required here? How does this work with mybinder.org -- I don't recall having log it when using that service

  4. Pre-pulling of images. Our docs mention "Pre-pulled images, for faster startup." -- is that set up here? (It didn't appear like it)

sgibson91 commented 9 months ago

@yuvipanda said in this comment:

A homepage cannot be used when tmpauthenticator is used, but we're not using tmpauthenticator so a homepage can exist.

Regarding authentication, I just followed Yuvi's recommendations. I suspect that there's an authentication layer (that allows anyone so long as it's a valid account) just to provide some protection against crypto-mining. So I followed the deployment strategy Yuvi used for the AGU Binder instance. At the moment, it's GitHub only but we should be able to add Google and Microsoft I'd guess.

No, I didn't turn on pre-pulling images, is it a requirement?

In summary, there's slightly different information in the ephemeral hub documentation and Yuvi's recommendation which is the cause for the discrepancy.

yuvipanda commented 9 months ago

Thanks for the quick work, @sgibson91! I've left review there (along with some doc PRs that were surfaced by that PR!).

I think there are two remaining tasks:

How does that sound?

yuvipanda commented 9 months ago

I've opened https://github.com/2i2c-org/infrastructure/pull/3657 with cryptnono setup and extensively documented. It also switches auth to tmpauthenticator.

Ideally I'd have waited for @sgibson91 to go through that and make sure she can turn cryptnono on, but because the deadline for having this out is tomorrow, I've also enabled that. But I'd love for you to try to go through it anyway @sgibson91 and make sure the docs are helpful. We can try to turn that on in a different cluster to test?

I've test deployed this as well, so the spyglass hub is now public.

yuvipanda commented 9 months ago

@jmunroe discovered that we need to have a shared directory with the raw data needed for the spyglass demo to run. I've documented how you can mount the shared directory from one hub to another on the same cluster as part of https://github.com/2i2c-org/infrastructure/pull/3657

yuvipanda commented 9 months ago

This initial request can be closed as the hub has been set up. Any other engineering work required here can come through separately.