Closed colliand closed 9 months ago
Hey @pending (@colliand for now) and @colliand! š I noticed there is still pending information about the new hub deployment. Can you please help us fill it in?
The information pieces still missing, are:
- hub authentication type
- hub user image tag and name
- extra features you would like to enable
- other relevant information to the features above
Details about each of them can be found in the top comment. But if you have questions about any of them, please ping the 2i2c/engineering
team and they will help you.
After the form in the top comment is filled in, an engineer will be assigned and will start deploying the new hub š. Thank you!
There is overlap here with the request in
The immediate priority is the for a binder-like or nbgitpuller-like URL to render the notebooks in the Spyglass repository https://github.com/LorenFrankLab/spyglass using the environment defined in https://github.com/LorenFrankLab/hhmi-spyglass-image . For the preprint, the LorenFrank group may also created a more tailored repository of just the notebooks and configuration files and not include the Spyglass source code .
This group is about publish a preprint that will include a URL that needs to redirect and unauthenticated user to an ephemeral (no permanent storage) JupyterHub that also supports an ephemeral MySQL database in a sidecar container. @2i2c-org/engineering is request to provide guidance on how to achieve this behaviour.
HHMI represents several independent communities who will need to control their own environments. Rather than having those groups manage their own image environments, a GESIS-like binder build process could be deployed with persistent storage for authenticated users.
There is also value in having a more traditional BinderHub deployed for arbitrary code execution with no persistent storage. It is not clear to me if this needs to be unauthenticated users or could be wrapped behind some sort of light registration layer.
While the hub is discussing a 'HHMI Hub', i think it we may need to consider it more as a collection of hubs on the same HHMI cluster.
There are two ways to approach this:
Given the timeline and the fact that we need to have some additional db (https://github.com/2i2c-org/infrastructure/issues/3624 sidecar), I think we should proceed with (1) (as documented in https://infrastructure.2i2c.org/howto/features/ephemeral/)).
I do think this ephemeral hub should be temporary, and we should switch it out to a binderhub soon. But this lets us do this piecemeal.
So tasks here are:
allow_all
. This is not blocked on me right now.cryptnono
and enable that here just for this cluster. More information in https://github.com/jupyterhub/mybinder.org-deploy/security/advisories/GHSA-j42g-x8qw-jjfh (but probably not visible to most users here?). This is blocked on me, and instead of actually doing the thing I will write documentation that describes how to do this.OK, I have assigned this one to @sgibson91 who is going to deploy this one with @yuvipanda's assistance.
Let's start with the first step described by Yuvi.
@sgibson91, please voice any questions or blockers you may have in performing step 1.
@jmunroe, we need more details about the sidecar piece so we can start discussing the best way to make it happen given the ephemeral hub context.
Thanks!!
I agree that deploying an ephemeral hub in the HHMI cluster is a good initial step to solve the the needs of the Loren Frank group.
There is already some configuration in hhmi/common.values.yaml
that can be pulled out into this new ephemeral hub on the same cluster.
For the image:
image: "quay.io/lorenlab/hhmi-spyglass-image:c307f9418a60"
For the mysql sidecar :
singleuser:
extraContainers:
- name: mysql
image: datajoint/mysql # following the spyglass tutorial at https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
ports:
- name: mysql
containerPort: 3306
resources:
limits:
# Best effort only. No more than 1 CPU, and if mysql uses more than 4G, restart it
memory: 4Gi
cpu: 1.0
requests:
# If we don't set requests, k8s sets requests == limits!
# So we set something tiny
memory: 64Mi
cpu: 0.01
env:
# Configured using the env vars documented in https://lorenfranklab.github.io/spyglass/latest/notebooks/00_Setup/#existing-database
- name: MYSQL_ROOT_PASSWORD
value: "tutorial"
And I really like the idea of using nbgitpuller link . The desire from HHMI is "magic-link" that will direct readers to a environment where the notebooks are available and can be run. The specific details of binder or nbgitpuller are not important HHMI.
I will work with the LorenFrank group to ensure the nbgitpuller link works as intended.
What should we call this hub? I think neither ephemeral.hhmi.2i2c.cloud nor binder.hhmi.2i2c.cloud are great fits?
I suggest spyglass.hhmi.2i2c.cloud
If this will be publicly accessible, I guess we'd like a basehub, not a daskhub (as the staging and prod hhmi hubs are)?
Yes to basehub only for this deployment. I don't see any usage of daskhub in their content.
Hub can be tested at https://spyglass.hhmi.2i2c.cloud
Thanks @sgibson91 !
Is this deployment following the description given in https://infrastructure.2i2c.org/howto/features/ephemeral/ ? If so, it is not matching some of my expectations (and those could just be issues with my expectations!):
Landing page. Our documentations says "No home page is visible, so our home page customizations do not work." But I do see a landing page...(which can be useful -- I am just trying to understand if it should be there or not)
Authentication 1. I see we are instead of tmpauthenticator
we are going to use CILogon instead with allow_all
selected. Is it possible to add additional oauth endpoints (e.g. Google, Microsoft) or having a GitHub account a requirement?
Authentication 2. Is authentication required here? How does this work with mybinder.org -- I don't recall having log it when using that service
Pre-pulling of images. Our docs mention "Pre-pulled images, for faster startup." -- is that set up here? (It didn't appear like it)
@yuvipanda said in this comment:
- Set up an ephemeral hub in the HHMI cluster (infrastructure.2i2c.org/howto/features/ephemeral). Let's use CILogon but with
allow_all
. This is not blocked on me right now.
A homepage cannot be used when tmpauthenticator is used, but we're not using tmpauthenticator so a homepage can exist.
Regarding authentication, I just followed Yuvi's recommendations. I suspect that there's an authentication layer (that allows anyone so long as it's a valid account) just to provide some protection against crypto-mining. So I followed the deployment strategy Yuvi used for the AGU Binder instance. At the moment, it's GitHub only but we should be able to add Google and Microsoft I'd guess.
No, I didn't turn on pre-pulling images, is it a requirement?
In summary, there's slightly different information in the ephemeral hub documentation and Yuvi's recommendation which is the cause for the discrepancy.
Thanks for the quick work, @sgibson91! I've left review there (along with some doc PRs that were surfaced by that PR!).
I think there are two remaining tasks:
tmpauthenticator
. @sgibson91 is right that this is to nominally prevent cryptomining. But since we want anyone to be able to use this, the real answer there is to set up cryptnono
. So I suggest we keep the CILogon with GitHub temporarily - to allow the HHMI community to test things out, but switch over to tmpauthenticator
(and hence no login) once we deploy stronger cryptnono
. I think that will be my primary priority for tomorrow.How does that sound?
I've opened https://github.com/2i2c-org/infrastructure/pull/3657 with cryptnono setup and extensively documented. It also switches auth to tmpauthenticator.
Ideally I'd have waited for @sgibson91 to go through that and make sure she can turn cryptnono on, but because the deadline for having this out is tomorrow, I've also enabled that. But I'd love for you to try to go through it anyway @sgibson91 and make sure the docs are helpful. We can try to turn that on in a different cluster to test?
I've test deployed this as well, so the spyglass hub is now public.
@jmunroe discovered that we need to have a shared directory with the raw data needed for the spyglass demo to run. I've documented how you can mount the shared directory from one hub to another on the same cluster as part of https://github.com/2i2c-org/infrastructure/pull/3657
This initial request can be closed as the hub has been set up. Any other engineering work required here can come through separately.
The GitHub handle of the community representative
pending (@colliand for now)
Hub important dates
The template below does not exactly apply. This is a request for a publicly accessible binder designed (initially) to render notebooks in the Spyglass repository: https://github.com/LorenFrankLab/spyglass
Eventually, this hub will likely be used to serve repos generated by various other research groups affiliated with HHMI. The current focus is to deploy infrastructure to enable a "magic link" that interactively renders Spyglass tutorial content.
Hub Authentication Type
pending
First Hub Administrators
[GitHub Auth only] How would you like to manage your users?
None
[GitHub Teams Auth only] Profile restriction based on team membership
Profile should be set up to work for Spyglass.
Spyglass uses an adjacent database. This was briefly discussed in a call with @yuvipanda .
Hub logo image URL
should use same image on hhmi.2i2c.cloud hub which is not rendering right now;
Hub logo website URL
https://hhmi.org
Hub user image GitHub repository
to work with Spyglass for now. build on the fly via binder/GESIS tooling?
Hub user image tag and name
pending
Extra features you would like to enable
(Optional) Preferred cloud provider
None
(Optional) Preferred cloud region
as with hhmi.2i2c.cloud
(Optional) Billing and Cloud account
None
Other relevant information to the features above
No response
Tasks to deploy the hub