Closed colliand closed 2 years ago
We already have a live PaleoHack hub: https://github.com/2i2c-org/infrastructure/blob/master/config/clusters/2i2c/paleohack2021.values.yaml
Is this a request for a new one, @colliand?
Yes, I believe this request is for a persistent hub. I look forward to further guidance from @khider.
Yes, this is a request for a new, permanent one for research purposes. Answering/Commenting on above:
Authentification: any chance to also allow for Github auth?
Hub links and logos: link: linkedearth.2i2c.cloud Image: https://github.com/LinkedEarth/Logos/blob/master/linkedearth_hub_logo.png
Image and repo. The idea is to allow to pull examples from several repositories (which may not have the same environment). Repos gallery (click image link to get to the GitHub repos): http://linked.earth/gallery.html#gallery There are images available here: https://quay.io/repository/linkedearth/pyleoclim?tab=tags (use latest version but there may be issues with tags)
We're working with Pangeo Forge to transform some simulations from netcdf to Zarr format. Task scaling may be needed once this task is completed and some of our example notebooks would use the results of these simulations.
Cloud credit: we may need to have a discussion. For the grant, I had to separate the 2i2c service from cloud billing. So I need to setup an account. Guidance would be helpful for this.
any chance to also allow for Github auth?
I don't believe it's possible to mix authentication types presently, so it would have to be one or the other.
Cloud credit: we may need to have a discussion. For the grant, I had to separate the 2i2c service from cloud billing. So I need to setup an account. Guidance would be helpful for this.
Is there any cloud provider preference?
any chance to also allow for Github auth?
I don't believe it's possible to mix authentication types presently, so it would have to be one or the other.
Can they use their own institution logon or would have to somehow register with us?
Cloud credit: we may need to have a discussion. For the grant, I had to separate the 2i2c service from cloud billing. So I need to setup an account. Guidance would be helpful for this.
Is there any cloud provider preference?
No. So open to discussion on what would be best.
Just one thing to keep in mind: some of the data will be on Pangeo-forge. So we either need to get close to it or duplicate for the life of this project.
I spoke with @khider via telephone. Here is a brief report and a few requests for action from the 2i2c team:
livingearth.2i2c.cloud
hub to be held by USC and not by 2i2c. The reasons for this involve accounting and billing overhead. Deborah is prepared to swipe a credit card to establish the cloud billing account and share credentials with the 2i2c. The main next actions:
livingearth.2i2c.cloud
hub.@rabernat do you have any recommendations about where to deploy this cluster/hub given the fact they are trying to use data hosted in Pangeo Forge? cc @yuvipanda who might have thoughts as well.
Assuming this is the data we are talking about: https://pangeo-forge.org/dashboard/feedstock/7, @khider can you confirm if this is the data we are talking about? It seems it is handled by this bakery: pangeo-ldeo-nsf-earthcube and this one seems to be a GCS one: https://github.com/pangeo-forge/pangeo-forge-gcs-bakery. So, it might be the case the data actually lives in GCS US central-1? I am not familiarized enough with Pangeo Forge... so it might be the case I am fundamentally wrong 😉.
cc @sgibson91 (it might be the case this is actually a GCP hub instead of an AWS, but not sure yet).
Yes, for the modern data. We also need to work with them to get the PMIP data onto Pangeo Forge, I assume at the same location.
The data are actually being deposited in the OSN storage pod at NCSA, so not actual in any commercial cloud. They are fully public, with no requestor pays or other authorization required. I would place the hub in whichever cloud region is in closest network proximity to NCSA. Google cloud US-CENTRAL-1 has worked well for us in the past, but I'm sure there is a nearby AWS region too.
Google cloud US-CENTRAL-1 has worked well for us in the past
OK, given this information, a GCP-based daskhub deployment in their own dedicated cluster seems to be the right choice.
@yuvipanda @sgibson91, do you agree with me on this assessment (since you have been lately interacting with new dedicated clusters on GCP, I think you may have good feedback here)?
Yep, putting this on gcp at us-central-1 is the way to go. Let's put it on us-central1-c as that has most variety of GPU available (https://cloud.google.com/compute/docs/gpus/gpu-regions-zones) in case we need it in the future.
OK, @khider, I think the next step would be to follow: https://docs.2i2c.org/en/latest/admin/howto/create-billing-account.html#billing-and-cloud-accounts, to create the GCP cloud billing account and add 2i2c engs to it.
Btw, in step 2.5, could you please add the following emails: yuvipanda@2i2c.org, sgibson@2i2c.org, georgianaelena@2i2c.org, and damianavila@2i2c.org. Thanks!
And done! Everyone should now have access.
For contextual/additional information about the above assignations, @sgibson91 will be the lead developer for this hub with @yuvipanda's assistance as a secondary companion/helper/supporter.
@khider to confirm, is the billing account you created called "LinkedEarth Jupyter Hub"? (I am assuming yes from the links in the top comment of this issue, but just in case)
I have created the linked-earth-hubs
project in our 2i2c.org GCP organgisation, attached the Linked Earth Jupyter Hub billing account, and given everyone on the engineering team access to the new project
@khider to confirm, is the billing account you created called "LinkedEarth Jupyter Hub"? (I am assuming yes from the links in the top comment of this issue, but just in case)
Yes
@khider can you confirm if you would prefer CILogon authentication (username@institution.edu
) or GitHub authentication please?
@khider can you confirm if you would prefer CILogon authentication (
username@institution.edu
) or GitHub authentication please?
CILogon preferred
@khider Thanks! And what's the institution domain please? The institution.something part? Once I have this info, the hub is near enough ready. Definitely ready for you to try out.
We have also setup a daskhub for you, but dask_gateway is not installed in the pyleoclim image you asked us to use, so you may have trouble using dask in a scalable fashion until dask_gateway is available in the image
The hubs are located at:
We have also setup a daskhub for you, but dask_gateway is not installed in the pyleoclim image you asked us to use, so you may have trouble using dask in a scalable fashion until dask_gateway is available in the image
I can add it to the image, and since we're using the latest tag, it should pull automatically, correct? Which version should I use? It seems like I need to same as what's on the server according to: https://gateway.dask.org/install-user.html
Thanks. How do I login myself? I'm apparently not part of the approved users.
@khider, I need to know your institution.domain
(e.g. berkeley.edu
) so I can add it to the CILogon config. At the minute, only folk with 2i2c.org Google accounts can log in.
I will find out what version of dask_gateway you need to install into your image. Update: version is 2022.6.1
we're using the latest tag, it should pull automatically, correct?
Yes and no - it's generally not advised to use latest as it's not clear exactly how the infrastructure behaves. I think we could turn on continuous pre-pulling to reduce the chance of old images lying around, but you could end up with a node running the old image because it hasn't been told to pull a new one yet.
I was advised to leave it at latest so we can update as needed. But I pushed a new container. Let me know if this solve the problem.
Re: logon. We will have people from multiple institutions, so not sure I can anticipate all the domains. I'm guessing this is why GitHub might work better unless we do something as generic as Google.
Re: logon. We will have people from multiple institutions, so not sure I can anticipate all the domains. I'm guessing this is why GitHub might work better unless we do something as generic as Google.
My suggestion is to create (or use an existing) GitHub organization, and anyone who has access to that organization will have access to the hub. This will mean we'll use GitHub (rather than CILogon) for authentication.
Ok, please let me know which GitHub org you would like to use and I will work on switching the auth method over.
I have changed auth over to GitHub orgs, currently only the 2i2c org has access until I know which other org to add.
Also confirmed that dask-gateway is working with the new image
GitHub -> LinkedEarth: https://github.com/LinkedEarth
Question: do I need to add everyone interested in using the hub to the organization?
do I need to add everyone interested in using the hub to the organization?
Yes
@khider you should now be able to access the hubs, and you will be be admin when you log in as well. Please note that the first time you will log in, an admin of the LinkedEarth GitHub org will need to authorise the OAuth App as detailed in these docs: https://infrastructure.2i2c.org/en/latest/howto/configure/auth-management.html#follow-up-github-organization-administrators-must-grant-access
Hi! @khider will be on vacation next week so I'll try to fill in for her, with the caveat that I know precisely NOTHING about cloud things, but I am eager to learn!
I am an vacation next week too, so you will be in the capable hands of @yuvipanda!
So I tried to login with my GitHub but was never asked to add LinkedEarth as an org. So it went to the 403:Forbidden screen. Any way I can add it directly into my org settings?
@khider I suspect this is because you tried to login when we were using CILogon. I'd try clearing your browser cookies and/or logging in from a private/incognito browser. The docs I linked to above also contain information on how to resolve the 403 error
I've requested a quota increase from Google Cloud as well, so you can expand to more users when necessary. I've asked for 256 total CPUs and 64 total nodes.
Sorry all, was taking some time off.
I was able to logon but I was never asked to allow LinkedEarth as an org. I'll try to add other people in the org that have the same level of power over authenticating.
On another note, I'm getting the following:
And I can't seem to be able to shutdown the server.
@yuvipanda, do you have any thoughts about this one? The log message seems related to the previous quota-related comment you made, maybe? Or do you think this is a transient GCP's lack of enough resources in the zone?
Also, I'm able to login but not anyone from the GitHub organization. I was never asked to authenticate the app and it doesn't show on my GitHub apps page. Any advice on adding it manually?
@khider, did you follow the advice from Sarah on this message: https://github.com/2i2c-org/infrastructure/issues/1418#issuecomment-1173598394?
@sgibson91, do you have any further thoughts besides the instructions you shared before?
Yes. And that solved the issue for me. I was able to access it. But I can't add anyone else, they get a 403 permission error as well.
do you have any further thoughts besides the instructions you shared before?
My suspicion is that none of the members of the GitHub org have their membership publicly listed which is a requirement for the read:user
scope the hub is configured with. I have opened https://github.com/2i2c-org/infrastructure/pull/1521 which changes the scope to read:org
and should allow members to log in, regardless of if their membership of the org is publicly listed or not.
Do I need to publicly list the members? I assume I can do that an as owner or is it locked to a user's preference?
@khider I changed the scope so it should work without people listing their membership publicly now. But each person would have to change it on their own profile, you wouldn't be able to do it for them.
Hub Description
2i2c has been engaged by members of the paleoclimate community. Deborah Khider (@khider) has been our principal contact in establishing this connection. This engagement extends previous work we've done to support the PaleoHackWeeks. We will learn more from Deborah about the community and their needs as we roll out the service. Here are a few links that provide some background information.
LinkedEarth
Deborah's home page
Community Representative(s)
@khider
Important dates
Hub Authentication Type
CILogon (e.g., username@institution.org)
Hub logo information
Hub user image
latest
tagsOnce again, seeking input from @khider!
Does your community have a Docker Image that specifies the software environment you wish to use in your hub(s)? If so, please share a link and we will review and take steps to set things up using that environment. If not, we can proceed with a Pangeo image and help you and your community adapt the environment to better meet your needs.
Extra features you'd like to enable
Other relevant information
No response
Hub URL
linkedearth.2i2c.cloud
Hub Type
daskhub
Tasks to deploy the hub