Closed colliand closed 1 year ago
I have re-assigned this deployment to @yuvipanda. Yuvi, if you have any doubts about the details of this deployment, please ping @colliand for further details.
Hey,
Nice to meet you all finally (and virtually), I have seen your work in the past and I am a big fan you have done great things! Congrats!
The GitHub team to use is: https://github.com/orgs/ClimateMatchAcademy/teams/2023students Is it possible to use 2 teams with different rights? Students we trust less but trust more teaching assistants: https://github.com/orgs/ClimateMatchAcademy/teams/2023teachingassistants I am unsure if we could / would allow different quotas based on the team a user is in? If this can not be done then please just use the students team.
As for the Docker image. I will be building a new image based on the pang image, we have a few extra packages / pip installs of custom packages that need adding. I will get this done by Monday next week and will post you the image. I will have the docker file and I am thinking of pushing to docker hub (I know there is a quota for free accounts / orgs (100 or 200 pulls? From memory). Is there a different service you would need setup? I believe pangeo use Quay.io http://quay.io/? Yeah not quite sure what you need but I am sure I can provide either the image or docker file.
Best regards,
Weley Banfield
On 17 May 2023, at 11:49, Damian Avila @.***> wrote:
I have re-assigned this deployment to @yuvipanda https://github.com/yuvipanda. Yuvi, if you have any doubts about the details of this deployment, please ping @colliand https://github.com/colliand for further details.
— Reply to this email directly, view it on GitHub https://github.com/2i2c-org/infrastructure/issues/2524#issuecomment-1551088117, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEJ3GYCZI3G3GUV7PJFXWTLXGSNJ5ANCNFSM6AAAAAAXU5ARXE. You are receiving this because you were mentioned.
Glad to work with you, @WesleyTheGeolien! Yes, we prefer you use quay.io rather than dockerhub! Let us know once the repo + image are setup :)
Great,
Will do @yuvipanda,quick question will it always pull the latest? Eg. I post an image then realise I need some extra dependency so build and push a new image (potentially with some tag but the same tag I give you) will that auto update the hub (bearing in mind some time to propagate?)
In previous projects I have used watchtower I don't know if your setup uses something similar?
@WesleyTheGeolien if your hub uses only one image, you will be able to self-configure it as an admin to pull whatever tag you want. We prefer to not use the 'latest' tag, but have the admins change tags when necessary via UI.
And re: teams, let's just start with allowing access to the students team and see if that is enough?
@yuvipanda ok sounds good:
Hi @yuvipanda
So I have setup our ci to build docker image and currently push to my personal dockerhub: https://hub.docker.com/r/wesleyban/climatematch-notebook
We are looking at changing this to quay.io and associating with climatematch so it is succeptible to change in the coming days/ weeks, sorry for the hassle.
if needed the dockerfile can be found here: https://github.com/ClimateMatchAcademy/course-content/blob/docker/Dockerfile (currently on docker
branch but will be merged into main
)
@WesleyTheGeolien thanks! I realize the GCP vs AWS question hasn't been resolved. What kinda data would you be using this with? My inclination is to put this on GCP as that is where our existing shared cluster lives. Any objections?
@yuvipanda I don't know if you are authorized to say but it would be the "same" or similar datasets to Pangeo, I am not sure where they host?
I guess the main issues is around data access to Climate data sets in the cloud and not having to pay network egress fees.
Otherwise I have uploaded "small" datasets to OSF -> Climatematch not sure how that would integrate ?
Also the questions about does AWS / GCP allow connections from all countries? We have a substantial amount of students in Iran and China for example would this cause a problem on either of the platforms? If so I guess we choose the other platform!
I have canvassed my team members and will get back with the list of cloud hosted resources we are using.
similar datasets to Pangeo
Unfortunately this is too broad :( All the current pangeo related hubs (including m2lines) are hosted on GCP, so maybe if that works, this is fine?
I guess the main issues is around data access to Climate data sets in the cloud and not having to pay network egress fees.
Note that network egress fees aren't paid by you, but by the agency hosting the data.
I have canvassed my team members and will get back with the list of cloud hosted resources we are using.
This would very much help!
In that case if all Pangeo is hosted on GCP I think that is fine, please confirm @abodner.
Ahh I thought the egress charges were paid by the hub, that is somewhat a win then!
Here is a list of current datasets being used:
CMIP data from pangeo
SST data loaded from NOAA in the notebook
Precipitation data loaded from NOAA in the notebook
Air temperature anomaly data loaded from NOAA in the notebook
CHIRPS (but looks like we have some locally saved files)
MODIS (but looks like we have some locally saved files)
ECCO-2
MERRA2
ERA5 (was s3 bucket)
@WesleyTheGeolien picking this back up,
We have a substantial amount of students in Iran and China for example would this cause a problem on either of the platforms? If so I guess we choose the other platform!
Unfortunately this is totally out of our control, and afaik both cloud platforms are the same here (blocked in Iran, accessible in China).
@WesleyTheGeolien and just to confirm (because you mention use with m2lines), you are not planning on using dask-gateway with this hub?
Correct @yuvipanda, we are not planning to use dask!
@WesleyTheGeolien @abodner check out https://climatematch.2i2c.cloud!
ClimateMatchAcademy:2023students
. The first time you login, *you must specifically grant access to the ClimateMatchAcademy
organization (there should be a "Grant" button next to the list of orgs you are a part of) when you log in. If this is confusing / does not work, and you are willing to temporarily grant me admin rights on the ClimateMatchAcademy organization, I can set this up too.n2-highmem-2
) but we can make that bigger too closer to the time of startup.Test it out and lmk how it goes?
Thanks @yuvipanda. FYI @abodner, the ClimateMatch Academy hub is available for testing here: https://climatematch.2i2c.cloud/
@abodner @WesleyTheGeolien if you'd like this to be at hub.climatematch.io, please add a CNAME record pointing hub.climatematch.io
to climatematch.2i2c.cloud
. I'd like us to keep the staging domain under 2i2c.cloud if that's ok though.
All sounds good. This is very exciting! Thanks all for being so quick!
@yuvipanda the logo is not ours. I have shared ours in the past but can provide another file.
It would be great if students did not have to have the additional github grant access step. I am happy to give you admin rights if that can be spared from students.
@abodner ah yes please do provide a URL to a logo I can use! The logo link in this GitHub issue doesn't work :(
And yes, the 'grant' step only needs to happen the very first time. Please grant me admin access, I'll do it and then we can remove my access.
Thanks @yuvipanda you should have admin access now. Let me know when you are finished please, I'd like to limit the number of admins on our side.
@yuvipanda here is a new link to our logo: https://drive.google.com/file/d/1ASKF7CwfkLYWsGjkMrgkvyRNbEdMCrMN/view?usp=sharing
@abodner you can remove my access now, all good now. You should try to get someone with just student team access to login to make sure it works, but it should.
I don't think we can link directly to the google drive link :( Is it already on your website or somewhere we can directly include as an <img>
tag maybe?
Thanks @yuvipanda. Can I send you the png for now? We use google sites and I am not sure the logo is stored in a very clever way.
@abodner hmm I'll poke around with it tomorrow if that's ok!
Do test out the memory available to see if that works or we need to increase it!
Sounds great, thanks @yuvipanda ! Are all datasets @WesleyTheGeolien provided available already?
Ah, I haven't done anything related to those. I though those are all externally provided (by NOAA or GCP or similar) and don't need anything done on our end. Can you verify that, @WesleyTheGeolien?
@abodner I've fixed the logo, check it out.
I'll wait to hear from @WesleyTheGeolien about datasets.
Ahh sorry everyone somehow missed these notifications.
Hey @yuvipanda so we had some questions around data on the hub. We use publicly hosted cloud datasets, from my understanding these are fine to interact with (without egress charges) (with the potential caveat of needing to be on the same region as they are hosted). However we also have some other data sets (roughly 20 GB hosted on osf as well as a 50ish Gb data set we are still unsure on what to do with.
I think pulling this data from every student on the hub seems a bit redundant? Is there a way to cache data / add data to the Hub? I saw some s3 connectivity in the jupyter lab interface? Just wondering on what the best practices are for getting data up there? (I assume baking it into the Docker image is a bad idea -> we don't really want 100gb images ...)
@WesleyTheGeolien there is a 'shared-readwrite' directory available that admins can put datasets in, and it is available in a readonly fashion under the 'shared/' directory for everyone else. Think that can work out?
Thanks @yuvipanda that should work out.
Another quick question I have someone testing the hub. From my understanding each user has a provision of ~12Gb of Ram but at the bottom (near the left) of the screen it says 2Gb, and they are complaining that loading a 800mb file into memory is crashing the hub. Is this expected?
cheers
The climatematch logo is not rendering as the splash image on the login page: https://climatematch.2i2c.cloud/hub/login. FYI @yuvipanda.
@WesleyTheGeolien as i mentioned in https://github.com/2i2c-org/infrastructure/issues/2524#issuecomment-1572520388, I actually have provided only 2G of RAM right now. m2lines 'small' profile is about 7GB - want me to bump that up?
Ahh thanks @yuvipanda I didn't see that, yep we are getting crashes when running our tutorials so bumping to 7gb would be great, out of interest are these arbitary values or set steps?
@WesleyTheGeolien alright, bumped now ain https://github.com/2i2c-org/infrastructure/pull/2665!
@WesleyTheGeolien @abodner I'm going to close this issue now, as the hub is up and running. Please email support@2i2c.org if you have any more issues! And definitely let us know at least 2 weeks before any major events with information on how many people you expect, so we can size up your nodes accordingly.
The GitHub handle of the community representative
@abodner
Hub important dates
Target start date: 2023-06-01 Target end date: 2023-08-31
Heavy usage will take place during the course. The course will run July 17-28 2023.
Hub Authentication Type
GitHub (e.g., @mygithubhandle)
First Hub Administrators
[GitHub Auth only] How would you like to manage your users?
Allowing members of specific GitHub team(s)
[GitHub Teams Auth only] Profile restriction based on team membership
pending
Abigail, can you please point to the GitHub team that Climatematch will use to manage user access to the hub?
Hub logo image URL
https://lh6.googleusercontent.com/pK1Zrf_NmWJ5KqhFB___4p8HPTf4D6u2om5UQkJbVQcwGjDSwlELPibkFfqW809chxybGrQwgiln8v0fRC00fYGzrsb6vIfFtsbh6PetpJKrk_UPoUb-4-RAH6ibtpXyxQ=w1280
Hub logo website URL
https://academy.climatematch.io/
Hub user image GitHub repository
pending, likely best to use latest pangeo image
Hub user image tag and name
pending; likely latest pangeo image
Extra features you would like to enable
(Optional) Preferred cloud provider
AWS
(Optional) Billing and Cloud account
None
Other relevant information to the features above
Climatematch Academy will train a cohort of ~1000 students in computational methods for climate science. The academy is partly inspired by Pangeo and builds on a similar virtual school in Neuroscience created and operated by Neuromatch.
small
machine type deployed for M2Lines hub.Tasks to deploy the hub