2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
103 stars 62 forks source link

[New Hub] London Interdisciplinary School #1485

Closed choldgraf closed 1 year ago

choldgraf commented 2 years ago

This is an issue to collect information and coordinate deploying a new hub! We should:

Roles

About the community

Community Representative(s) (reference docs)

Important Dates

About the infrastructure

Hub Authentication Type (reference docs): github

Hub Logo

User image (reference docs)

Extra features

JupyterHub information

Deployment checklist

choldgraf commented 2 years ago

ping to @LaCrecerelle and @matthew-brett - we can use this issue to coordinate around the information needed to deploy the hub in your community. There are a number of missing fields above, could you take a quick look and help us fill in the missing information? When that's done, we can get started

@2i2c-org/tech-team - let me know if I missed any important information up there and I can add it in

damianavila commented 2 years ago

https://github.com/orgs/2i2c-org/teams/tech-team - let me know if I missed any important information up there and I can add it in

One of the main pieces from the above list is the cloud provider where the cluster/hub will be deployed.

Preferred cloud provider: No requirement, preferably not Google

This is confusing to me... in fact, I think we prefer GCP deployments (although we are able to deploy in AWS and Azure as well). Can you clarify what you are trying to describe on that line, @choldgraf?

matthew-brett commented 2 years ago

@damianavila - I bet this preference came about from discussions between @LaCrecerelle and @choldgraf - maybe they can say more? I personally have a moderate preference for Google, because I know GKE moderately well.

choldgraf commented 2 years ago

Yep - if I recall @LaCrecerelle expressed concerns about Google privacy implications relative to the other two cloud providers.

The 2i2c team generally has a preference for Google Cloud as well, from an "ease of service and support" perspective - their Kubernetes service is just generally more stable and straightforward to maintain. But we'll defer to the community representatives on this one. Perhaps @matthew-brett and @LaCrecerelle can discuss and make a recommendation?

matthew-brett commented 2 years ago

Just for my reference - here is the relevant Berkeley data hub discussion page: https://docs.datahub.berkeley.edu/en/latest/admins/cluster-config.html

kestraI commented 2 years ago

Hi all - just to clear up any confusion - as long as data remains in the UK I'm happy to use whichever cloud provider that 2i2c & @matthew-brett think is best. Our direct relationship is with 2i2c and I'm super happy with your privacy and other terms, but as discussed on the phone with @choldgraf our data must remain UK bound.

My meaning of data, for avoidance of ambiguity, refers specifically to hubs/notebooks and user information.

matthew-brett commented 2 years ago

Thanks @LaCrecerelle - then my vote would be for GKE.

sgibson91 commented 2 years ago

From this documentation, we would need to deploy into GCP zone europe-west2 which is based in London and has 3 regions.

matthew-brett commented 2 years ago

What is the best way to fill in the information above? Via copy-past into new cells here?

@LaCrecerelle - do we have a LIS Github organization? Would you like to be an admin on https://github.com/lisds - that's what I am using for my textbook. I can make a matching dockerhub account - unless we already have something? Do we have some logo URLs to use here?

I suggest we go for a start date in the middle of August, to allow ourselves some time to get ready. What do you think?

matthew-brett commented 2 years ago

Can I ask about the implications of "Scalable Dask Gateway Cluster" vs the standard cluster?

colliand commented 2 years ago

Hi @matthew-brett! The scalable cluster is described here: https://docs.2i2c.org/en/latest/about/distributions/research.html

Based on your anticipated usage, I suggest you launch the service without Dask.

matthew-brett commented 2 years ago

@colliand - so - if I understand correctly - the scalable cluster is the addition of a second cluster, in addition to the normal also-scalable JupyterHub user cluster, on which the user pods can execute dask-zarr jobs?

colliand commented 2 years ago

I think of it in two directions. Kubernetes scales "horizontally" to provide similar computational power to demand from new users of the cluster. Dask scales "vertically" and enables a single user to access compute resources beyond the power offered on their slice of the main resource.

matthew-brett commented 2 years ago

Right - but am I right that the difference is that the "scalable cluster" is the addition of an extra vertically scalable cluster, that users can tap into if they need? So I guess that can be added later, with the associated cost?

sgibson91 commented 2 years ago

I guess that can be added later

Not quite. It's a different helm chart - so we would have to tear down and redeploy.

matthew-brett commented 2 years ago

OK - thanks - so then the cost would be some fairly short down-time for the cluster if we decided to go that way later?

damianavila commented 2 years ago

What is the best way to fill in the information above? Via copy-past into new cells here?

That will work, I can update the first post with the answers you provide on the full thread.

so then the cost would be some fairly short down-time for the cluster if we decided to go that way later?

It depends. If there is no migration of users' content, then it should be a short downtime that we would need to coordinate. If we need to migrate users' content, then it could take more time depending on the amount of users' content.

There is more information about the migration process in case you are technically interested: https://infrastructure.2i2c.org/en/latest/howto/hubs/move-hub.html

Others in the @2i2c-org/tech-team can add more pieces in case I am missing something else in that possible transition.

damianavila commented 2 years ago

If we need to migrate users' content, then it could take more time depending on the amount of users' content.

An additional point from the linked documentation:

This might not entirely be necessary - if the source and target cluster are in the same GCP Project / AWS Account, we can just re-use the same home directory storage!

matthew-brett commented 1 year ago

So sorry for the delay - and that, because of the delay, we're running a bit tight. The answers are below, let me know if I've missed something.

Important Dates

Target start date: 2022-09-26
Required start date: 2022-10-03
Any important dates for usage: N/A

About the infrastructure

Hub Authentication Type (reference docs): Github

Hub Logo

URL to Image: https://www.lis.ac.uk/wp-content/themes/lis/library/images/logo.png
URL for image link: https://www.lis.ac.uk

User image (reference docs)

Image repository: https://github.com/lisds
Image registry: https://hub.docker.com/repository/docker/lisds/lisds-base
Image tag and name: lisds/lisds-base:a2c7c2a

Extra features

Dedicated kubernetes cluster: NO
Preferred cloud provider: GKE
Scalable Dask Gateway Cluster: NO

JupyterHub information

Hub URL: ds.lis.2i2c.cloud
Helm chart: basehub
matthew-brett commented 1 year ago

Can y'all let me know the Helm chart JupyterHub version so I can update in our images repo?

For reference : https://github.com/lisds/lisds-images/blob/main/lisds-base/requirements.txt#L7

GeorgianaElena commented 1 year ago

@matthew-brett, we recommend a jupyterhub version of at least 2.3.1.

This is actually the version that we're using for the default 2i2c hub's user image too:

https://github.com/2i2c-org/2i2c-hubs-image/blob/69b1f9dff7c7725c01b9697fcbfe8851c4412ebe/requirements.txt#L31

GeorgianaElena commented 1 year ago

Dedicated kubernetes cluster: NO

Regarding deploying this on the 2i2c shared cluster, I don't think this is a valid choice given that the cluster is deployed in the us-central1-b and the requirements are for the data to stay in UK. So, I believe we need to deploy a new cluster in the europe-west2 as per @sgibson91's suggestion

I believe the fastest way forward, given that the target start date is close is to create a new GCP project under the 2i2c billing account and deploy the new cluster there, in the appropriate zone. I believe the billing account can be changed later if need be. @damianavila, what do you think?

damianavila commented 1 year ago

I believe the fastest way forward, given that the target start date is close is to create a new GCP project under the 2i2c billing account and deploy the new cluster there, in the appropriate zone. I believe the billing account can be changed later if need be. @damianavila, what do you think?

+1 on your proposal.

matthew-brett commented 1 year ago

@GeorgianaElena - yes, that's correct about the Europe-base for the data - thanks for picking that up. Does the new cluster have any implications for cost?

matthew-brett commented 1 year ago

I've updated the image to have JupyterHub 2.3.1:

Image tag and name: lisds/lisds-base:781af1c

damianavila commented 1 year ago

Does the new cluster have any implications for cost?

It usually does (although I am not sure about the specific of the contract negotiation on this specific case, cc @colliand @jmunroe who surely have more context than me on that topic): https://docs.2i2c.org/en/latest/about/sustainability/strategy.html#markup-for-running-on-dedicated-clusters.

GeorgianaElena commented 1 year ago

@matthew-brett, there's now a hub running at https://ds.lis.2i2c.cloud. You and anyone in the https://github.com/lisds org should be able to login. Can you please try it and let me know if it works?

The LIS logo it's not showing on the main page of the hub unfortunately. I think it's because it has a transparent background and it's made of white lines? Do you have another one we could use for the hub, maybe one with a solid background color? Thanks!

matthew-brett commented 1 year ago

@LaCrecerelle - do we have another good icon to link to?

Our Github icon looks OK - is that the best one? Do we have a more canonical URL than https://avatars.githubusercontent.com/u/95037064?s=200&v=4 ?

matthew-brett commented 1 year ago

@GeorgianaElena - thanks - I can access the JupyterHub now - but can I ask for help for our Github authentication? At the moment I get:

 Looks like you have NOT been added to the list of allowed users for this hub. Please contact the hub administrators. 

Of course, I can (and have) asked @LaCrecerelle to allow me access via the Github interface, and I know that we can add individual Github users to the config.yaml file or equivalent - but could you give us advice as to the best way to authenticate all our students by default through the Github org?

GeorgianaElena commented 1 year ago

Looks like you have NOT been added to the list of allowed users for this hub. Please contact the hub administrators.

@matthew-brett, are you an admin of the https://github.com/lisds org? This GitHub organization should have access to the hub, but an admin of the org must first grant access to the 2i2c GitHub OAuth app. I believe the issue might be with the access of the 2i2c OAuth App to the org.

You can find more info and steps on how to fix this possible issue in these docs https://infrastructure.2i2c.org/en/latest/howto/configure/auth-management/github-orgs.html#follow-up-github-organization-administrators-must-grant-access has more info about it. Can you please double check the steps in those docs and let me know if authentication works afterwards please?

but could you give us advice as to the best way to authenticate all our students by default through the Github org

There are three options to manage users in a hub that uses GitHub authentication:

So, it's up to you which one of this options should be used in the LIS hub. What do you think? Do you want to move away from the first option or is that ok?

matthew-brett commented 1 year ago

@GeorgianaElena - thanks. Just to check - the organization should be:

https://github.com/lisacuk

and not:

https://github.com/lisds

Just to confirm - you are in fact using https://github.com/lisacuk ?

GeorgianaElena commented 1 year ago

Oh, I thought lisds was the correct org (from https://github.com/2i2c-org/infrastructure/issues/1485#issuecomment-1183223815). I just changed it now to allow https://github.com/lisacuk instead. Sorry for the misunderstanding.

GeorgianaElena commented 1 year ago

@matthew-brett, please let me know if the authentication works after this change?

matthew-brett commented 1 year ago

Yes - it works for me - thanks!

GeorgianaElena commented 1 year ago

Yay! I will then close this issue @matthew-brett, but please feel free to reach out to support if you have any issues https://docs.2i2c.org/en/latest/support.html#get-support

matthew-brett commented 1 year ago

Thanks for this! I get a 404 not found for:

https://ds.lis.2i2c.cloud/

after a warning about "This connection is not private".

Just checking - do you mean that it authorizes against the lisacuk Github org? (not the lisds org)

On Thu, Sep 22, 2022 at 5:02 PM Georgiana Elena @.***> wrote:

@matthew-brett https://github.com/matthew-brett, there's now a hub running at https://ds.lis.2i2c.cloud. You and anyone in the https://github.com/lisds org should be able to login. Can you please try it and let me know if it works?

The LIS logo it's not showing on the main page of the hub unfortunately. I think it's because it has a transparent background and it's made of white lines? Do you have another one we could use for the hub, maybe one with a solid background color? Thanks!

— Reply to this email directly, view it on GitHub https://github.com/2i2c-org/infrastructure/issues/1485#issuecomment-1255237543, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAQQHFMYZAOQ4GJNIJOAOTV7R7J5ANCNFSM52GFNAMQ . You are receiving this because you were mentioned.Message ID: @.***>